dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Normalize model files

Open sh-rp opened this issue 7 months ago • 3 comments
trafficstars

Description

Starting point of implementing normalization of model files.

In this PR a ModelNormalizer is added which needs to be implemented and tested. There is an attempt at adding a dlt_load_id, but at this point this will fail because of the way the insert statement is created in the load stage. Furthermore, regarding the dlt_load_id, the call method of the normalizer will have to update the schema and return this schema update in cases where the dlt_load_id column is not yet part of the schema.

sh-rp avatar Apr 10 '25 13:04 sh-rp

There is an attempt at adding a dlt_load_id, but at this point this will fail because of the way the insert statement is created in the load stage. This has stopped failing (handled in feat/2366-sql-jobs-3 ), except for the case when the Select statement has a star schema.

anuunchin avatar Apr 11 '25 15:04 anuunchin

Deploy Preview for dlt-hub-docs canceled.

Name Link
Latest commit 5f79604c0c6ced909114ab30daae7b38edf99b7a
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/68235fd56394060008c52600

netlify[bot] avatar Apr 15 '25 08:04 netlify[bot]

Sqlite doesn't have an in built UUID, will probably resort to something similar to what's used in Redshift's case

anuunchin avatar Apr 22 '25 10:04 anuunchin

@anuunchin really nice, I think we are almost there! What I would additionally maybe consider, is to create a test setup, similar to the one where you test normalization for all dialects, that runs the normalizer for many different queries for the duckdb dialect to see if edge cases are covered. This test should not do any loads, but just get a schema and a query and some settings (put dlt columns or not) and you assert the normalized query. These will run super fast and you can test lots of different things. We should test that:

  • common table expressions work: WITH (query) SELECT ...
  • nested star selects should probably work (not sure): SELECT name FROM (SELECT * FROM customer)
  • Star selects on joins should not work: `SELECT t1.*, t2.name FROM (t1 join t2)
  • Test different kinds of re-ordering also when joining

I'm sure there are quite a few more, but having this setup would be good to be able to add these cases later.

sh-rp avatar May 09 '25 08:05 sh-rp