sling-cli icon indicating copy to clipboard operation
sling-cli copied to clipboard

How to do incremental update with new rows where update_key is not set

Open ArcqStrato opened this issue 1 year ago • 2 comments

Suppose I have an Oracle table with two columns, ID (the primary key) and MDATUM (the update key).

ID MDATUM
1 NULL
2 NULL
3 2024-09-18 02:00:00.000

Now I upload this table to Snowflake.

Next, one row is added to the table (ID=4), and one row is updated (ID=2), which means it gets an MDATUM:

so it looks like this:

ID MDATUM note
1 NULL
2 2024-09-18 03:00:00.000 this row is updated
3 2024-09-18 02:00:00.000
4 NULL this row is new

How can get those two new rows into Snowflake using an incremental load?

When I load with

--mode incremental \
--primary-key 'ID' \
--update-key 'MDATUM' \

then I get this table:

ID MDATUM note
1 NULL
2 2024-09-18 03:00:00.000 correctly updated
3 2024-09-18 02:00:00.000

As you can see, ID=3 is updated as expected. The new row (ID=4), on the other hand, is not inserted into the target table.

I believe this is in line with your documentation.

However, from a practical standpoint, this is not helpful. My example above is a real-life example from our production table. And it makes sense: If a new row is added then that's not an update, and thus the update_key is empty. Only if a row is modified then the update_key is set, which makes sense.

I therefore believe there should be another load strategy that loads "new records after max(update_key)" plus "new records with primary_key larger then max(primary_key).

A workaround is I can make run Sling twice:

First, I load using these parameters:

--mode incremental \
--primary-key 'ID' \
--update-key 'MDATUM' \

This gets me all rows where the update_key MDATUM is newer then max(MDATUM) in my target table, i.e., all updated rows.

Then I run Sling again and this time I use:

--mode incremental \
--primary-key 'ID' \
--update-key 'ID' \

This should get me all IDs where the ID is higher then max(ID) in my target table, i.e., all new rows.

I feel my use-case is quite plausible and should be quite widespread.

What are your thoughts on this? And do you see a problem with my workaround?

ArcqStrato avatar Sep 18 '24 15:09 ArcqStrato

Would love to know any feed back on if using ID for both primary and update key worked?

justin-fundrise avatar Mar 19 '25 12:03 justin-fundrise

@justin-fundrise it works in practice. It will replace your existing records in place

aleenprd avatar Apr 01 '25 07:04 aleenprd