dlt icon indicating copy to clipboard operation
dlt copied to clipboard

New "refresh" mode and "dev_mode"

Open steinitzu opened this issue 1 year ago • 3 comments

Description

Still needs some cleanup and testing.

  • full_refresh arg replaced with dev_mode
  • Deprecation warning when full_refresh=True
  • Add refresh: "full" | "replace" arg to pipeline
  • full drops all tables/resource states/source state keys
  • replace drops tables/state from the resources selected to extract

The new refresh mode introduces a new file to the load package: dropped_tables.json, this gets created in extract
That was the simplest way I could think of to drop tables in appropriate stages (local schema and state is updated in extract and destination tables get dropped in load step).
Otherwise we'd have to start poking at destination storage before extract which feels wrong.

Refresh works as follows:

  1. Before extract -> uses drop command to wipe all/selected resource state and tables (local state only, destination tables are not deleted and normalize/load doesn't run as part of drop)
  2. Add dropped_tables.json to load package with the tables found by drop command
  3. In load stage -> read dropped_tables.json and drop the tables from destination schema (non-sql destination fall back to truncate on dataset init)

Related Issues

  • Resolves # #588

steinitzu avatar Mar 07 '24 00:03 steinitzu

Deploy Preview for dlt-hub-docs canceled.

Name Link
Latest commit 879e4e5ea61229c79b4d297a917e3930abf65609
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6657a4adfac32b0008eaee57

netlify[bot] avatar Mar 07 '24 00:03 netlify[bot]

@steinitzu this looks pretty cool now. in the meantime we refactored extract code so please merge the current changes. sorry for that!

rudolfix avatar Apr 08 '24 20:04 rudolfix

@steinitzu do you need a review here? PR is still draft

rudolfix avatar Apr 28 '24 18:04 rudolfix

OK! now tests are almost in order! still to do:

1. I have a strong opinion to rename our modes  to `drop_sources`, `drop_resources` and `drop_data`. WDYT? this corresponds more to what happens

2. is it much work to add `refresh` to `run` and `extract` methods of `Pipeline`?

3. we need to improve docstrings and docs

4. let's remove all schemas for a given name from a destination. Old flag was not that bad IMO - it leaves all details to the implementer

5. a simple test for local filesystem - it really makes sense when I look how many users use that destination
  1. Yes. I think that naming is much better. I changed it.
  2. Added this and tests for each. Nice to have this.
  3. I updated both a bit. Let me know if anything missing.
  4. Doing this again now, but I renamed the drop_tables replace_schema argument to delete_schema and changed the behaviour.
    This is simpler for initializing client. No need to do the dance with swapping out client schema, we just drop tables and delete schema history in one transaction -> then "update stored schema" runs as normal after.
  5. We don't support drop_tables in filesystem. Should we add it? I guess it would be same as truncate basically (maybe delete the folders also?). But I made a test for refresh=drop_data

steinitzu avatar May 22 '24 21:05 steinitzu

@sh-rp thanks for review. the part where I asked you to look deeper is how we pass and write the package state and if the pipeline state is handled correctly. I do not see any updates on that. so I assume this is mostly right?

@steinitzu plenty of tests are failing. my take is that you should try to run common test locally so we have less turnover on CI (if you do not do that already)

rudolfix avatar May 29 '24 09:05 rudolfix

@steinitzu plenty of tests are failing. my take is that you should try to run common test locally so we have less turnover on CI (if you do not do that already)

@rudolfix got it. I don't always run them but should for sure before pushing. Usually I try to run the tests I think are relevant, in this case lot more than I thought.

Common + load postgres/duck/local filesystem tests are passing locally now

steinitzu avatar May 29 '24 14:05 steinitzu

The state stuff looks good to me too. The way it is loaded to the destination has not been changed here yet, this is for an upcoming PR (correct me if I'm wrong there @steinitzu), only those new fields have been added and the state is committed after extraction, so I don't see a problem, or any migration stuff we need to take care of.

sh-rp avatar Jun 03 '24 12:06 sh-rp