dlt icon indicating copy to clipboard operation
dlt copied to clipboard

post merge improvements rollup

Open rudolfix opened this issue 1 year ago • 0 comments

Background Several things could be much better and several tests are missing. Let's try to fix those one by one

Tasks

    • [x] when replace write disposition is used on an existing resource, the resource state is not reset before loading. reset the resource state see #214 - we have the layout implemented
    • [ ] ~the loader needs refactoring. we should split it into stage (load files to bucket), load (does what it does now but without merging) and merge which generates and executes merge transformations.~
    • [x] right now, propagation of root key is enabled for merge tables in a data normalizer. we need something more clever ie. to enable it when no root key is propagated (user can set up this and use something else than dlt default)
    • [ ] move column propagation into table definition, right now those are in normalizer config. still, the propagation is done by normalizer so those elements should be validated by it
    • [ ] propagated columns should inherit the hints from parent table or such hint could be specified in the propagation definition

Tests we have end to end tests, but following unit tests are missing:

    • partial columns: column inference when there's no data type, merging of partial columns, non null coercion on partial columns (hmmm this should not happen...)
  • test sql jobs on dummy and destinations

  • test followup tasks (all file tasks completed, failed and completed tasks allowed)

  • LoadJob: job_id and job_file_info (dummy)

  • sql client test truncate

  • test staging in sql_client_impl: init_storage, truncate, update schemas selectively for tables

  • partial columns are not updated (without data type)

    • we are not saving the default values, check the flows with schema export/import
    • table, column diff and merge
  • get_child_tables and get_top_level table

  • loader: list jobs for table, add new job

  • merge generation for different cases, check with sqlfluff: case without child tables, case without any keys, case with single merge and single primary, case with compound keys

rudolfix avatar Apr 09 '23 19:04 rudolfix