dlt New "refresh" mode and "dev

Description

Still needs some cleanup and testing.

full_refresh arg replaced with dev_mode
Deprecation warning when full_refresh=True
Add refresh: "full" | "replace" arg to pipeline
full drops all tables/resource states/source state keys
replace drops tables/state from the resources selected to extract

The new refresh mode introduces a new file to the load package: dropped_tables.json, this gets created in extract
That was the simplest way I could think of to drop tables in appropriate stages (local schema and state is updated in extract and destination tables get dropped in load step).
Otherwise we'd have to start poking at destination storage before extract which feels wrong.

Refresh works as follows:

Before extract -> uses drop command to wipe all/selected resource state and tables (local state only, destination tables are not deleted and normalize/load doesn't run as part of drop)
Add dropped_tables.json to load package with the tables found by drop command
In load stage -> read dropped_tables.json and drop the tables from destination schema (non-sql destination fall back to truncate on dataset init)

Related Issues

Resolves # #588

Mar 07 '24 00:03 steinitzu

Deploy Preview for dlt-hub-docs canceled.

Name	Link
Latest commit	879e4e5ea61229c79b4d297a917e3930abf65609
Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/6657a4adfac32b0008eaee57

Mar 07 '24 00:03 netlify[bot]

@steinitzu this looks pretty cool now. in the meantime we refactored extract code so please merge the current changes. sorry for that!

Apr 08 '24 20:04 rudolfix

@steinitzu do you need a review here? PR is still draft

Apr 28 '24 18:04 rudolfix

OK! now tests are almost in order! still to do:

1. I have a strong opinion to rename our modes  to `drop_sources`, `drop_resources` and `drop_data`. WDYT? this corresponds more to what happens

2. is it much work to add `refresh` to `run` and `extract` methods of `Pipeline`?

3. we need to improve docstrings and docs

4. let's remove all schemas for a given name from a destination. Old flag was not that bad IMO - it leaves all details to the implementer

5. a simple test for local filesystem - it really makes sense when I look how many users use that destination

Yes. I think that naming is much better. I changed it.
Added this and tests for each. Nice to have this.
I updated both a bit. Let me know if anything missing.
Doing this again now, but I renamed the drop_tables replace_schema argument to delete_schema and changed the behaviour.
This is simpler for initializing client. No need to do the dance with swapping out client schema, we just drop tables and delete schema history in one transaction -> then "update stored schema" runs as normal after.
We don't support drop_tables in filesystem. Should we add it? I guess it would be same as truncate basically (maybe delete the folders also?). But I made a test for refresh=drop_data

May 22 '24 21:05 steinitzu

@sh-rp thanks for review. the part where I asked you to look deeper is how we pass and write the package state and if the pipeline state is handled correctly. I do not see any updates on that. so I assume this is mostly right?

@steinitzu plenty of tests are failing. my take is that you should try to run common test locally so we have less turnover on CI (if you do not do that already)

May 29 '24 09:05 rudolfix

@steinitzu plenty of tests are failing. my take is that you should try to run common test locally so we have less turnover on CI (if you do not do that already)

@rudolfix got it. I don't always run them but should for sure before pushing. Usually I try to run the tests I think are relevant, in this case lot more than I thought.

Common + load postgres/duck/local filesystem tests are passing locally now

May 29 '24 14:05 steinitzu

The state stuff looks good to me too. The way it is loaded to the destination has not been changed here yet, this is for an upcoming PR (correct me if I'm wrong there @steinitzu), only those new fields have been added and the state is committed after extraction, so I don't see a problem, or any migration stuff we need to take care of.

Jun 03 '24 12:06 sh-rp

New "refresh" mode and "dev_mode"

Description

Related Issues

✅ Deploy Preview for dlt-hub-docs canceled.

Deploy Preview for dlt-hub-docs canceled.