dlt
dlt copied to clipboard
feat: SQLGlot for DDL parsing
Relates to #2561
Description
This maps dlt hints to SQLGlot DDL expressions. This internal feature could enable future usage:
Table discovery
To enable @dlt.transformation, it's useful to know about tables manages outside of dlt.
It would make sense to discover from the dlt.Dataset API to add tables to its schema. Discovery should be limited to the same "database/dataset" namespace on the destination. Retrieving DDL from destination is a very cheap operation.
We would need a new hint for external tables.
tentative API.
dataset = dlt.Dataset(...)
table_foo = dataset.table("foo") # regular access
table_bar = dataset.table("bar") # raise TableNotExist
dataset.discover() # all tables within "database/dataset" namespace
dataset.discover("bar") # single specific table
dataset.discover(["bar", "baz"]) # multiple specific tables
dataset.discover(pattern="ba.*") # unknown number using regex
table_bar = dataset.table("bar") # succeeds because table is now in schema
This effect is persisted to the schema storage and the destination.
Table creation
We have a lot of templated SQL strings for destination clients. We also use destination-specific SDK libraries.
Generating DDL from dlt hints could simplify internals and help with maintenance. This potentially simplifies supporting and testing new SQL backends.
More importantly, dlt -> DDL allows more powerful user configuration (e.g., create an index).
Deploy Preview for dlt-hub-docs canceled.
| Name | Link |
|---|---|
| Latest commit | c1002c9e284bdde4f4feabb5e0cd3f72c17b11f6 |
| Latest deploy log | https://app.netlify.com/projects/dlt-hub-docs/deploys/684aeaa1995ccc0008c7419d |
This is an improvement over SQLAlchemy reflection because it doesn't require the existence of an SQLAlchemy driver for the destination