Jorrit Sandbrink comments

Results 24 comments of


                                            Jorrit Sandbrink

Add `upsert` merge strategy

@sh-rp We already discussed this on Slack, but I'll add it here too for transparency and completeness. > Also why do we need a primary key? The merge into has...

Add `upsert` merge strategy

@rudolfix Okay, I will leave this branch and PR open as input for #1129 and create a new PR that simply implements `delete-insert` for Athena Iceberg.

Hugging Face Hub integration

@rudolfix 1. Schema evolution seems not supported. Did a simple test with CSV and Parquet. Multiple files can be handled, but only if they contain the same column names (error...

Add `upsert` merge strategy

Closed PR. This has moved to https://github.com/dlt-hub/dlt/pull/1466.

Filesystem destination does not raise exception when using scd2 merge strategy

@Nintorac thanks for taking the time to create the repo. I cloned the repo and ran `docker compose up`: ![image](https://github.com/user-attachments/assets/38aed27f-c248-4d26-bcdd-08fd9928a4a7) As you see in the screenshot, I get `exec ./run.sh:...

Filesystem destination does not raise exception when using scd2 merge strategy

I think I understand what's going on here. I don't think it has anything to do with local versus s3. I think it has to do with `dlt` versions differences....

Filesystem destination does not raise exception when using scd2 merge strategy

@Nintorac I did mention that, but it was an incorrect statement. The check that raises that exception has been introduced after `0.5.1`. I ran `dlt --version` back then to check...

Ability to pass empty Arrow tables/datasets to `write_deltalake` with `rust` engine

@sherlockbeard yes, here it is: ```py import pyarrow as pa import pyarrow.parquet as pq from deltalake import write_deltalake arrow_table = pa.Table.from_pydict( {"foo": [1, 2], "bar": [True, False]} ) empty_arrow_table =...

`scd2` merge strategy does not reinsert records

Challenge in fixing this is that (by default) the record hash gets stored in `_dlt_id`. Reinsertion of the same record (with the same hash) violates the uniqueness constraint of `_dlt_id`.

`scd2` merge strategy does not reinsert records

> best if user could add updated_at column or some kind of serial number tracking updates. (it could be used as row version instead of computing row hash which costs...