HomeBlogData Engineering
Data Engineering 6 min

Medallion Architecture — Why Bronze, Silver, and Gold Layers Are Not Optional.

Raw data and analytics-ready data are fundamentally different things. Conflating them is the source of most data platform maintenance burden.

Raw data and analytics-ready data are fundamentally different things, and conflating them in a single storage layer is the source of most data platform maintenance burden. Raw data is what arrived from the source: the exact bytes, with whatever schema the source system produced, at whatever cadence the pipeline ran. Analytics-ready data is the result of a series of deliberate transformations: cleaning, deduplication, business rule application, join resolution, and aggregation. Storing both in the same place and mixing queries against them is how organizations end up with ten different definitions of "active customer" and no canonical answer to "what was our revenue last Tuesday."

The medallion architecture formalizes the separation. The bronze layer is an immutable archive of raw ingested data. Nothing is ever modified or deleted from bronze. It is the source of truth for replay, audit, and debugging. When a transformation logic error produces wrong output in a downstream layer, the ability to reprocess from bronze — applying corrected logic to the original data — is what makes recovery deterministic rather than speculative. Organizations that skip the bronze layer discover its value the first time they need to explain to an auditor exactly what data the model was trained on.

The silver layer is where cleaning and standardization happen. Field names are normalized, data types are enforced, duplicates are resolved, and records from multiple sources are joined into coherent entities. A customer record in silver is not the raw CRM export. It is the resolved entity that combines CRM data, transactional history, and support interactions into a single consistent view, with a documented lineage trail showing exactly which source records contributed to it. Silver is the layer that analytical models consume. It changes when business rules change, not when source systems change.

The gold layer is purpose-built for consumption. Aggregated metrics, pre-joined reporting tables, feature stores for machine learning — gold is optimized for the access patterns of its specific consumers. A gold table for a weekly executive dashboard looks nothing like a gold feature table for a real-time churn model, even if both draw from the same silver entities. The separation of gold from silver prevents the optimization choices made for one consumer from degrading the experience of another — and it makes the cost of analytical workloads visible and attributable, which is the first step toward managing them.