data-diff icon indicating copy to clipboard operation
data-diff copied to clipboard

Deletes not being output in to the table materialization

Open devcshort opened this issue 1 year ago • 2 comments

I am running diff_tables through a Python script and materializing all rows to a table within my DB. This seems to work great for figuring out our updated columns and rows, however deletes are not being materialized.

Below is the code I'm using. I wanted to check to see if an ID I'm expecting to get an output for in my tables (which isn't there) would show up in the output that the Python script gives me, which it did. I would expect that anything that shows up in the output for diff_tables within my script would also be materialized in to the table that data_diff uses for materialization. From what I can tell, it is not outputting deletes in materialization which throws wrench in the pipeline I'm currently working on.

try:
    for d in data_diff.diff_tables(
        source_table,
        target_table,
        extra_columns=columns,
        key_columns=key_columns,
        materialize_to_table=f"NORSE_DIFF.{SNOWFLAKE_CONN_INFO['schema']}.{table_name}",
        materialize_all_rows=True,
    ):
        if d[1][0] == "c91e4af2-4585-5cbb-924b-cbeb12b7919e":
            print(d[1][0])
except Exception as e:
    print(e)

I'm currently using [email protected] MacOS Apple Silicon

This is running within a Dagster environment as well.

devcshort avatar Oct 19 '23 22:10 devcshort

Seems like there may be an issue with the all_rows query here

They are passed into _materialize_diff here

dlawin avatar Oct 31 '23 19:10 dlawin

@devcshort can you explain how to materialize data-diff results to a redshift table for open source version for comparison with redshift db itself on a high level ? I am intend to do the same using dbt , redshift in local dbt core

a-s-sarkar-9299 avatar Jan 08 '24 16:01 a-s-sarkar-9299

Hi @devcshort,

I'm sorry for the delay in following up on this. Thank you for raising this issue and for looking into potential solutions!

We made a hard decision to sunset the data-diff package and won't provide further development or support.

If that's of interest, over the past few months, we have rewritten the diffing engine in Datafold Cloud and solved many issues that existed in this package's diffing algorithm.

-Gleb

glebmezh avatar May 17 '24 13:05 glebmezh