dlt
dlt copied to clipboard
1.0.0 announcement and release notes
Why 1.0
?
We are releasing 1.0.0
version of dlt
. In the last 2 years we've got quite stable (in term of our API, internal migrations and major bugs being rare) and feature complete. There so many production deployments that even with our obsessive approach to testing (you can always write more test cases!) we are pretty confident dlt
is now "stable" and ready for production.
What is coming if full release
- We move sql database filesystem/buckets and rest api sources to the core library to make them easily available, stabilize the APIs and run tons of additional tests.
- Our documentation gets a big update: additional tutorials on syncing the databases, working with buckets and file readers and using rest api toolkit to declare pipelines loading data from REST APIs
On top of that we will plan a few quick follow-up features:
- Define hints for nested tables/resources (currently only root table can be conveniently hinted) dlt-hub/dlt#1647
- Define cross-table references dlt-hub/dlt#1713
- SQL Alchemy destination is coming with SQLLite and MySQL fully tested (and optimized). You'll be able to bring your own settings to finetune other dialects (#1734 and dlt-hub/dlt#21 )
- We will finally stabilize dlt traces, expose a core source and a data contract (schema) so loading dlt metadata is easy and predictable
Deprecations and Breaking Changes
- Load packages with failed jobs (terminally) will be automatically aborted with an exception. Currently user had to detect this in code (this behavior will be still available). https://github.com/dlt-hub/dlt/issues/1749
- To use
iceberg
table format on Athena destination, set thetable_format
toiceberg
on all your resources instead offorce_iceberg
flag in destination configuration. This flag is deprecated but will be still observed for backward compatibility. -
complex
type is deprecated and superseded byjson
dlt-hub/dlt#1673
Internal or obscure changes:
- A few column hints (
foregin_key
andindex
) that were not documented and have no real use, will be removed. - if primary key was used in nested table, linking was not created in relational.py. now linking is skipped when nested row is fitted into table that is not nested (does not have a parent). a rare case of someone that does not want
dlt
linking - removes generate_dlt_id from json relational normalizer config
- deprecates
skip_complex_types
indlt
Pydantic config, asks to useskip_nested_types
- when extracting a list of standalone resources, they will be grouped in smallest possible number of source (previously: each resource was extracted in a single source, including transformers, dlt-hub/dlt#1535
- secrets (TSecretValue and configs deriving from Credentials) won't be saved to trace dumps dlt-hub/dlt#1687
dlt schema engine migration
If you run this version against existing dataset in a destination, schema in _dlt_version
will be migrated to engine v10. Same applies to local pipeline working dir. You can restore the old schema by deleting the migrated version from the version table.
New Versioning Scheme
We'll follow classical major.minor.patch
scheme. Where
-
major
means breaking changes and removed deprecations -
minor
new features, sometimes automatic migrations -
patch
bug fixes
Version rollout plan
-
0.5.x
will be still supported: docs will be available and major bugs fixed - We plan an alpha release with sources merged in the core and docs updates early next week.
- We plan
1.0.0
release in the second / third week of September - Each next week we'll release one of follow-up features
- Track our progress here: https://github.com/orgs/dlt-hub/projects/9/views/3
0.9.9a1 pre-release available
This pre-release brings sql, filesystem and rest_api sources to the core and introduces 95% of the breaking changes and the deprecations. New documentation is not yet available. ⚠️ do not deploy in production ⚠️ will migrate existing schemas - try on fresh datasets try
from dlt.sources.sql_database import sql_table
or
dlt init sql_database duckdb
to start a new project
breaking changes and warnings
Deprecations and Breaking Changes
- Load packages with failed jobs (terminally) will be automatically aborted with an exception. Currently user had to detect this in code (this behavior will be still available). https://github.com/dlt-hub/dlt/issues/1749
- To use
iceberg
table format on Athena destination, set thetable_format
toiceberg
on all your resources instead offorce_iceberg
flag in destination configuration. This flag is deprecated but will be still observed for backward compatibility. - Will migrate schemas to engine v. 10. this is irreversible
Internal or obscure features:
- A few column hints (
foregin_key
andindex
) that were not documented and have no real use, will be removed. - if primary key was used in nested table, linking was not created in relational.py. now linking is skipped when nested row is fitted into table that is not nested (does not have a parent). a rare case of someone that does not want
dlt
linking - removes generate_dlt_id from json relational normalizer config
- deprecates
skip_complex_types
indlt
Pydantic config, asks to useskip_nested_types
- if a list of resources is passed to
run
method, those will be evaluated in a single ad-hoc source. previously each resource was evaluated separately (serialized). https://github.com/dlt-hub/dlt/pull/1535
Other features
- Feat/1492 extend timestamp config to handle naive timestamps (without timezone) by @donotpush in https://github.com/dlt-hub/dlt/pull/1669
- Fix/1571 Incremental: Optionally load or ignore/exclude/include records with
cursor_path
missing or None value by @willi-mueller in https://github.com/dlt-hub/dlt/pull/1576 - Don't use Custom Embedding Functions on LanceDB by @Pipboyguy in https://github.com/dlt-hub/dlt/pull/1771
- sets default concurrency for blob upload for adlfs to 1 to avoid massive memory usage on large files by @rudolfix in https://github.com/dlt-hub/dlt/pull/1779
- Fix/1790 support incremental load with arrow when cursor column is not nullable by @willi-mueller in https://github.com/dlt-hub/dlt/pull/1791
- controls row group size and empty tables in memory buffer when writing parquet by @rudolfix in https://github.com/dlt-hub/dlt/pull/1782