dlt feat: `Dataset.write()`

Users of dlt.Dataset want a simple way to write data back to the dataset.

Use cases:

manually review data and push corrected records
simple way to add records if you don't have access to the original dlt.Pipeline used to create the dataset

Other motivations

This interface will simplify data-centric operations involved in:

storing data quality checks results on destination
creating a graph of datasets where the "internal pipeline" of dataset is used
integrate with orchestration frameworks

Specs

Look at WritableDataset.save() from dlt-plus
Add Dataset.write() in dlt (this aligns with pipeline.run() operation)
- Alternatives: .write_to(), .load_into(), .load_table()
create an internal dlt.Pipeline named _dlt_dataset_{dataset_name}
find a way for the internal pipeline to use the dlt.Schema from the dlt.Dataset instance; this way, this schema should evolve when Dataset.load() is used
potential API
```
def write(
  self: dlt.Dataset,
  data: TDataItems,
  *,
  table_name: str,
  write_disposition: TWriteDisposition = "append",
  normalize: bool = False,
) -> LoadInfo: ...
```
- write_disposition is useful to determine if we should append or modify existing records
- normalize allows the user to decide to enable normalization (which might create more tables)
can accept a dlt.Relation as input

Out of scope

Dataset.load() doesn't have to support 1-to-1 the dlt.Pipeline.run() method; if user needs full range of config, then they should create a pipeline

Sep 16 '25 21:09 zilto

Deploy Preview for dlt-hub-docs ready!

Name	Link
Latest commit	2c08e885713d2ad58a696ee26c44e317e5d9b47d
Latest deploy log	https://app.netlify.com/projects/dlt-hub-docs/deploys/68cdfa69f43aca0008136715
Deploy Preview	https://deploy-preview-3092--dlt-hub-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sep 16 '25 21:09 netlify[bot]

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
❌ Deployment failed View logs	docs	c2ac3830	Dec 02 2025, 11:25 AM

Sep 20 '25 00:09 cloudflare-workers-and-pages[bot]

my take would be to make internal pipeline used in write as invisible as possible possibly use pipelines-dir to hide it from command line and dashboard. in essence we pretend that this pipeline does not exist

I changed the internal pipeline to be a context manager that uses a temporary directory as pipelines_dir

disable destination sync, state sync and schema evolution (a total freeze on a table via contract)

I don't know exactly what I need to change / configure for destination and state sync (it doesn't seem to be in the kwargs for dlt.pipeline() and pipeline.run()).

For schema evolution, users should be able to modify schema. For example, someone wants to add a column or cast types. Though, I would have frozen schema as default and require users to explicitly change it.

Sep 20 '25 01:09 zilto

just had a meeting with marcin. the plan is now:

lets focus on Dataset.write() as a function to create standalone Datasets (vs. using the dataset from the pipeline)
signature is Dataset.write(data, table_name: str, overwrite:bool)
overwrite=True, respect the write disposition of the underlying data
overwrite=False does refresh="drop_resources", writes new schema into dlt_versions
document in docs: that if take the Dataset from the pipeline and introduce changes to the schema (new table (append) or overwrite) they need to do pipeline.sync_destination() to pull the changes into the pipeline

##Follow-up:

pipeline sync should be aware of: schema needs to be decoupled from state-sync atm: new schemas won't automatically be picked up by a pipeline because pipeline compares local state to remote state (and state doesnt reference schema (dlt_version -table)

Nov 27 '25 13:11 djudjuu

also: pipeline.sync_destination() only syncs if the remote state version has changed (which it hasn't after dataset.write() in which case it calls get_schema_from_destnation(always_download=False) which just looks up the schema in the local storage (where it hasnt changed) -> maybe we should introduce a force_download-flag to the sync_destination call :thinking: otherwise it won't happen

Dec 01 '25 14:12 djudjuu

dlt dlt copied to clipboard

feat: `Dataset.write()`

Other motivations

Specs

Out of scope

✅ Deploy Preview for dlt-hub-docs ready!

Deploying with Cloudflare Workers

dlt
dlt copied to clipboard

Deploy Preview for dlt-hub-docs ready!