Data quality lifecycle
The data quality lifecycle has rarely been achievable in a single tool due to the runtime constraints of traditional ETL vendors.
One library, end-to-end ingestion, transformation, with data quality and lineage
dlt is an open source pythonic ingestion library while dlthub is a commercial addition to dlt spanning into transformation and other areas of the data stack.
Because dlt together with dltHub span the entire pipeline, starting from ingestion, passing through a portable staging layer, and extending into the transformation, it uniquely bridges these gaps.
Instead of stitching together four or five separate tools, you write Python code that works across the entire pipeline. No glue scripts. No context lost between systems, end to end lineage and metadata.

The three checkpoints for data quality:
- In-flight: Check individual records as data is extracted, before loading it.
- Staging: We optionally load the data to an optionally transient staging area where we can test it without breaking production.
- Destination: Check properties of the full dataset currently written to the destination.
The five pillars of data quality
dlt addresses quality across five core dimensions, offering support for implementing these checks across the entire data lifecycle.
- Structural Integrity: Does the data fit the destination schema and types?
- Semantic Validity: Does the data make business sense?
- Uniqueness & Relations: Is the dataset consistent with itself?
- Privacy & Governance: Is the data safe and compliant?
- Operational Health: Is the pipeline running correctly?

1. Structural Integrity
Does the data fit the destination schema and types?
These checks ensure incoming data conforms to the expected shape and technical types before loading, preventing broken pipelines and "garbage" tables.
| Job to be Done | dlt Solution | Learn More | Availability |
|---|---|---|---|
| Prevent unexpected columns | Schema Contracts (Frozen Mode): Set your schema to frozen to raise an immediate error if the source API adds an undocumented field. | Schema Contracts | dlt |
| Enforce data types | Type Coercion: dlt automatically coerces compatible types (e.g., string "100" to int 100) and rejects non-coercible values to ensure column consistency. | Schema | dlt |
| Fix naming errors | Normalization: dlt automatically cleans table and column names (converting to snake_case) to prevent SQL syntax errors in the destination. | Naming Convention | dlt |
| Enforce required fields | Nullability Constraints: Mark fields as nullable=False in your resource hints to drop or error on records missing critical keys. | Resource | dlt |
2. Semantic Validity
Does the data make business sense?
These checks verify the content of the data against business logic. While structural checks handle types (is it a number?), semantic checks handle meaning (is it a valid age?).
| Job to be Done | dlt Solution | Learn More | Availability |
|---|---|---|---|
| Validate logic & ranges | Pydantic Models: Attach Pydantic models to your resources to enforce logic like age > 0 or email format validation in-stream. | Schema Contracts | dlt |
| Filter bad rows | add_filter: Apply a predicate function to exclude records that don't meet criteria (e.g., lambda x: x["status"] != "deleted"). | Transform with add_map | dlt |
| Check batch anomalies | Staging Tests: Use the portable runtime (e.g., Ibis/DuckDB) to query the staging buffer. Example: "Alert if the average order value in this batch is > $10k." | Staging | dlt |
| Built-in data checks | Data Quality Checks: Use built-in checks like is_in(), is_unique(), is_primary_key() with pre-load or post-load execution, plus actions on failure (drop, quarantine, alert). | Data Quality | dlthub |
3. Uniqueness & Relations
Is the dataset consistent with itself?
These checks manage duplication and preserve relationships between different tables in your dataset.
| Job to be Done | dlt Solution | Learn More | Availability |
|---|---|---|---|
| Prevent duplicates | Merge Disposition: Define primary_key and write_disposition='merge' to automatically upsert records. dlt handles the deduping logic for you. | Incremental Loading | dlt |
| Track historical changes | SCD2 Strategy: Use write_disposition={"disposition": "merge", "strategy": "scd2"} to automatically maintain validity windows (_dlt_valid_from, _dlt_valid_to) for dimension tables. | Merge Loading | dlt |
| Link parent/child data | Automatic Lineage: dlt automatically generates foreign keys (_dlt_parent_id) when unnesting complex JSON, preserving the link between parent and child tables. | Destination Tables | dlt |
| Find orphan keys | Post-Load Assertions: Run SQL tests on the destination to identify child records missing a valid parent (e.g., Orders without Customers). | SQL Transformations | dlt |
4. Privacy & Governance
Is the data safe and compliant?
Data quality also means compliance. These features ensure sensitive data is handled correctly before it becomes a liability in your warehouse.
| Job to be Done | dlt Solution | Learn More | Availability |
|---|---|---|---|
| Mask/Hash PII | Transformation Hooks: Use add_map to hash emails or redact names in-stream. Data is sanitized in memory before it ever touches the disk. | Pseudonymizing Columns | dlt |
| Drop sensitive columns | Column Removal: Use add_map to completely remove columns (e.g., ssn, credit_card) before they ever reach the destination. | Removing Columns | dlt |
| Enforce PII Contracts | Pydantic Models: Use Pydantic schemas to strictly define and detect sensitive fields (e.g., EmailStr), ensuring they are caught and hashed before loading. | Schema Contracts | dlt |
| Join on private data | Deterministic Hashing: Use a secret salt via dlt.secrets to deterministically hash IDs, allowing you to join tables on "User ID" without exposing the actual user identity. | Credentials Setup | dlt |
| Track PII through transformations | Column-Level Hint Forwarding: PII hints (e.g., x-annotation-pii) are automatically propagated through SQL transformations, so downstream tables retain knowledge of sensitive origins. | Transformations | dlthub |