Every A/B test failure that was not anticipated, every marketing attribution debate that never reaches a conclusion, and every pricing model that looked good in analysis but underperformed in deployment has the same root cause: confusing correlation with causation. The correlation is real. The data shows it clearly. But the correlation exists because both variables are influenced by a third variable that was never measured, or because the relationship runs in the opposite direction from the assumed mechanism, or because the historical data was collected from a distribution that differs from the distribution in which the model is being deployed.
The fundamental challenge of causal inference in business settings is that most of the data organizations have access to is observational — collected not through deliberate experimentation but through the normal operation of the business. Customers who received a promotional discount did not receive it randomly. The customers most likely to churn were targeted for a retention intervention. The products that were promoted were selected because they already showed strong demand signals. In all of these cases, the observed outcomes confound the effect of the treatment with the characteristics of the units that were selected for treatment. Naive regression of the outcome on the treatment variable estimates a correlation. It does not estimate a causal effect.
Randomized experimentation is the gold standard for causal estimation precisely because randomization breaks the relationship between treatment assignment and pre-existing characteristics. When the assignment mechanism is random, the treatment and control groups are equivalent in expectation on all observable and unobservable characteristics, and the difference in outcomes can be attributed to the treatment rather than to selection. The design of a valid experiment requires careful attention to the unit of randomization, the sample size required for adequate statistical power, the duration needed to observe the outcome of interest, and the potential for interference between treatment and control units — problems that are straightforward to state and non-trivial to solve in practice.
Difference-in-differences, instrumental variables, and regression discontinuity are the quasi-experimental methods that extract causal estimates from observational data when randomization is not possible. Each method relies on a different identification strategy — a parallel trends assumption, an instrument that affects treatment but not outcomes directly, a threshold that creates quasi-random variation in treatment assignment — and each assumption can be tested and defended to varying degrees. The discipline of causal inference is not about finding a method that produces a clean answer. It is about being explicit about what assumptions are required for any given estimate to have a causal interpretation, and being honest about the conditions under which those assumptions are likely to hold.
