Most organizations do not have a data problem. They have an expectation problem.
A finance partner builds a report that “has always worked” and suddenly a column goes null. A growth team launches a new segment and the definition of “active customer” quietly changes upstream. A machine learning feature starts drifting because a source system switched time zones. Nobody intended to break anything, but nobody explicitly promised anything either.
That gap between what consumers assume and what producers actually deliver is where trust goes to die.
Data contracts close that gap. They turn datasets into products with an explicit handshake: what the data means, what shape it takes, how often it updates, what quality guarantees exist, and what happens when change is needed. Done well, contracts become the rails that help teams ship faster with fewer surprises, not another governance tax.
This post expands the outline into a practical approach you can start with a single domain and scale without turning it into a bureaucracy.
A data contract is a shared agreement between a producer and a consumer that defines:
If you have ever said “marketing changed something” or “the warehouse is wrong,” what you really mean is: expectations were implicit, and the system had no enforced boundary.
This is why data contracts are often described as “shifting left” data quality and governance: you move checks and accountability closer to where data is produced, instead of discovering problems after the fact in dashboards and downstream models.
A good contract does not try to predict every future use case. It does one thing: it makes the current critical use cases safe and repeatable.
The fastest way to make data contracts real is to treat them like an API spec:
A practical thin slice is:
That is enough to change behavior.
Human-readable: a short page or markdown section that answers:
Machine-readable: YAML or JSON that tooling can validate. The Open Data Contract Standard (ODCS) is one structured option here and is designed to make contracts portable across tools. You do not have to adopt a standard on day one. But you should capture the same essentials so you can automate enforcement.
These four items eliminate an enormous amount of downstream pain:
Schema
Nullability
Enumerations
Time zones
Time zone ambiguity is one of the most common sources of subtle, expensive errors because everything looks fine until you reconcile at month-end.
Once you have one contract in production, scaling is less about writing more documents and more about making contracts the default interface for data change.
The single most important scaling move is to define “what counts as breaking.”
Most teams get stuck because every change triggers fear. Contracts solve that by making change explicit and testable.
A good compatibility policy usually includes:
If you are using streaming or shared schemas, formal compatibility modes like backward, forward, and full compatibility are well-defined patterns that many teams use to keep producers and consumers in sync.
Even if you are not on Kafka, the idea translates: “Will old consumers still work when the producer changes?” and “Will new consumers still read historical data?”Write those answers down, then enforce them.
As you expand beyond one dataset, you want reusable assertions that teams recognize instantly. Focus on a small set that map to real failure modes:
Notice what is not on the list: “looks right on the dashboard.” Dashboards are not enforcement. Dashboards are reporting.
The objective is predictable reliability, not pretty charts.
Dashboards do matter, just not as the first line of defense.
As you scale, build shared visibility:
The dashboard becomes an operational artifact. It is how you run the data product, not how you discover issues three weeks later during a business review.
Here is the step most organizations skip: you define reliability targets based on business usage, not engineering preference.
An SLO is a target for a measured service level indicator (SLI). This framing comes from SRE practices and is useful because it forces clarity on what “good” means and how you will measure it.
For data contracts, common SLIs include:
Then set SLOs based on decision deadlines:
This is where contracts become business-aligned. You are not chasing perfection. You are guaranteeing what the business needs to operate.
A practical pattern is to define:
Not every dataset deserves the same rigor. Contracts help you be explicit about which is which.
You can implement data contracts and still fail if you fall into two common traps.
Dashboards are downstream. By the time a dashboard shows “freshness red,” the damage is already done:
Contracts should trigger upstream checks:
Use dashboards to monitor health, not to certify correctness after the fact.
Silent failures are the most expensive kind because they create false confidence.
Examples:
Contracts prevent silent failures when they are enforced as gates:
This is the difference between “data is mostly fine” and “data is dependable.”
In mature organizations, data contracts stop being a special initiative and become the way data moves:
And importantly, contracts do not require a single tool. Standards like ODCS exist to help, but the real win is the operating model: clear ownership, explicit expectations, and automated enforcement.
Data contracts are not bureaucracy. They are a forcing function for clarity.
Start small. Write one contract that protects one decision.
Make it readable, make it enforceable, and make ownership explicit. Then scale by standardizing compatibility policies and a handful of assertions that prevent the failures you already live with.
When contracts are working, you feel it:
That is what foundations should do. They should stay out of the way while making speed safe.