Model selection gets the attention in most machine learning conversations. Which algorithm, which architecture, which hyperparameter configuration. Feature engineering does the work. The gap between a model that achieves a respectable validation metric and a model that produces decisions a business can act on is almost always found in the features — in whether the inputs to the model actually encode the information the model needs to separate the signal it is looking for from the noise it needs to ignore.
Lag features are the most important class of features for any prediction problem with a temporal dimension, and they are the features that are most often missed by teams that approach machine learning from a purely statistical background rather than a domain knowledge background. A demand forecast that includes yesterday's sales, the sales from the same day last week, and the sales from the same day last year is operating on information that is directly relevant to today's prediction. A forecast that includes only the current product category and a handful of static attributes is operating on information that is at best weakly correlated with what the model actually needs to know. The feature engineering discipline is the practice of translating domain knowledge into model inputs.
Interaction features capture relationships between variables that neither variable encodes independently. The product of price and promotional status encodes something that neither price alone nor promotional status alone can represent: the fact that the demand response to a price change is different when a promotion is running than when it is not. These interactions are discoverable through exploratory analysis — plotting the relationship between the target variable and each feature, separately for each level of potential moderating variables — and they are the features that most often produce the largest lift when added to a model that has already incorporated the obvious main effects.
Target encoding is the feature engineering technique that most frequently produces models that look good in development and fail in production. When a categorical variable with high cardinality — a product ID, a store identifier, a customer segment — is encoded as the mean target value for that category in the training data, the model learns a representation that is highly specific to the training distribution. When new categories appear in production, or when the target distribution shifts for a specific category, the encoded representation is wrong in ways that are not visible from validation metrics. The discipline of feature engineering includes understanding which encoding choices are stable and which are brittle — and building the monitoring infrastructure that will catch when they become brittle in production.
