data-diff
data-diff copied to clipboard
Deletes not being output in to the table materialization
I am running diff_tables
through a Python script and materializing all rows to a table within my DB. This seems to work great for figuring out our updated columns and rows, however deletes are not being materialized.
Below is the code I'm using. I wanted to check to see if an ID I'm expecting to get an output for in my tables (which isn't there) would show up in the output that the Python script gives me, which it did. I would expect that anything that shows up in the output for diff_tables within my script would also be materialized in to the table that data_diff uses for materialization. From what I can tell, it is not outputting deletes in materialization which throws wrench in the pipeline I'm currently working on.
try:
for d in data_diff.diff_tables(
source_table,
target_table,
extra_columns=columns,
key_columns=key_columns,
materialize_to_table=f"NORSE_DIFF.{SNOWFLAKE_CONN_INFO['schema']}.{table_name}",
materialize_all_rows=True,
):
if d[1][0] == "c91e4af2-4585-5cbb-924b-cbeb12b7919e":
print(d[1][0])
except Exception as e:
print(e)
I'm currently using
[email protected]
MacOS Apple Silicon
This is running within a Dagster environment as well.
Seems like there may be an issue with the all_rows query here
They are passed into _materialize_diff here
@devcshort can you explain how to materialize data-diff results to a redshift table for open source version for comparison with redshift db itself on a high level ? I am intend to do the same using dbt , redshift in local dbt core
Hi @devcshort,
I'm sorry for the delay in following up on this. Thank you for raising this issue and for looking into potential solutions!
We made a hard decision to sunset the data-diff package and won't provide further development or support.
If that's of interest, over the past few months, we have rewritten the diffing engine in Datafold Cloud and solved many issues that existed in this package's diffing algorithm.
-Gleb