A pipeline that runs on a cron schedule is not orchestrated. It is timed. The distinction becomes obvious the first time an upstream dependency is late, a source system is down for maintenance, or a backfill operation needs to run without interfering with the daily production schedule. A cron-based system has no concept of these conditions. It fires at the appointed time regardless of whether its inputs are ready, fails silently or loudly depending on how much error handling was written into the script, and leaves the on-call engineer to reconstruct what happened by reading logs.
Orchestration is the practice of modeling pipelines as directed acyclic graphs of dependencies, where each task knows what it depends on, can declare conditions for execution, and participates in a system that tracks state, manages retries, and provides observability without requiring engineers to instrument every pipeline individually. The DAG model makes dependency relationships explicit and machine-enforceable rather than implicit and documentation-dependent. When task B depends on task A, the orchestrator guarantees that B will not execute until A has completed successfully — not because a developer remembered to schedule them twenty minutes apart, but because the dependency relationship is encoded in the graph.
Backfill operations are the test that most orchestration implementations fail. When a business rule changes and historical data needs to be reprocessed from six months ago to the present, a mature orchestration system should be able to execute that backfill in parallel, managing resource contention with production workloads, handling partial failures gracefully, and providing a clear view of progress across thousands of task instances. Most organizations discover the quality of their orchestration implementation during their first major backfill, when a naive sequential reprocessing job would take eleven days to complete and the business needs the corrected data in two.
Sensor-based triggering extends orchestration beyond time into event space. Rather than running a transformation pipeline on a schedule and hoping the upstream data is ready, a sensor waits for a specific condition — a partition arriving in object storage, a record count threshold being met, a quality check completing successfully — and triggers the downstream pipeline only when that condition is satisfied. This decouples pipeline timing from pipeline dependencies, which is the architectural property that makes a data platform composable rather than brittle.
