Jorrit Sandbrink
Jorrit Sandbrink
See this draft PR for relevant discussions and code inspiration: https://github.com/dlt-hub/dlt/pull/1294
My take, based on previous work in https://github.com/dlt-hub/dlt/pull/1294: Add a new key-based merge strategy called `upsert`: - if key is not present in destination table ➜ insert record - if...
@rudolfix > "what you can do with it that you cannot do with the current merge?" There's no extra functionality indeed. Clear semantics would be the main benefit. People are...
@rudolfix 1. I would approach it like this: - use `DELETE` + `INSERT` as base implementation - use `MERGE` if `WHEN NOT MATCHED BY SOURCE` is supported (e.g. on `mssql`...
@rudolfix 1. I would do it based on root key and unique key like last time: https://github.com/dlt-hub/dlt/pull/1294/files#r1635024156 2. I will do Snowflake first. 3. Yes, `upsert` would be default for...
@rudolfix I think we should close this ticket after #1466 is merged. I will create a new ticket for `upsert` support for destinations beyond `postgres` and `snowflake`.
@sh-rp We might need to increase the `adlfs` version requirement in `pyproject.toml`. Right now it's resolving to `adlfs` version `2023.8.0` which doesn't contain the fix.
@rudolfix Do we also want to do this for `synapse`? For `synapse` we already implement `SynapseCopyFileLoadJob` based on `COPY INTO`, which is the recommended approach: _"While dedicated SQL pools support...
@rudolfix @sh-rp Before continuing development on this branch, does the direction I've taken here make sense?
@rudolfix **First addressing a main consideration here:** I don't think `upsert` can (and should) be compatible with `delete-insert`. `upsert` needs a `primary_key` to ensure a one-to-one relationship between records in...