dlt
dlt copied to clipboard
Normalize model files
Description
Starting point of implementing normalization of model files.
In this PR a ModelNormalizer is added which needs to be implemented and tested. There is an attempt at adding a dlt_load_id, but at this point this will fail because of the way the insert statement is created in the load stage. Furthermore, regarding the dlt_load_id, the call method of the normalizer will have to update the schema and return this schema update in cases where the dlt_load_id column is not yet part of the schema.
There is an attempt at adding a dlt_load_id, but at this point this will fail because of the way the insert statement is created in the load stage. This has stopped failing (handled in feat/2366-sql-jobs-3 ), except for the case when the Select statement has a star schema.
Deploy Preview for dlt-hub-docs canceled.
| Name | Link |
|---|---|
| Latest commit | 5f79604c0c6ced909114ab30daae7b38edf99b7a |
| Latest deploy log | https://app.netlify.com/sites/dlt-hub-docs/deploys/68235fd56394060008c52600 |
Sqlite doesn't have an in built UUID, will probably resort to something similar to what's used in Redshift's case
@anuunchin really nice, I think we are almost there! What I would additionally maybe consider, is to create a test setup, similar to the one where you test normalization for all dialects, that runs the normalizer for many different queries for the duckdb dialect to see if edge cases are covered. This test should not do any loads, but just get a schema and a query and some settings (put dlt columns or not) and you assert the normalized query. These will run super fast and you can test lots of different things. We should test that:
- common table expressions work:
WITH (query) SELECT ... - nested star selects should probably work (not sure):
SELECT name FROM (SELECT * FROM customer) - Star selects on joins should not work: `SELECT t1.*, t2.name FROM (t1 join t2)
- Test different kinds of re-ordering also when joining
I'm sure there are quite a few more, but having this setup would be good to be able to add these cases later.