dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Filesystem destination does not raise exception when using scd2 merge strategy

Open Nintorac opened this issue 1 year ago • 5 comments

dlt version

dlt==0.5.1

Describe the problem

I have set write_disposition={'disposition': 'merge', 'strategy':'scd2'}

Initially when I ran this with an s3 destination it worked, but when I run it with a local filesystem it gave the exception dlt.common.destination.exceptions.DestinationCapabilitiesException: 'scd2' merge strategy not supported for 'filesystem' destination.

However in writing the reproduction it will no longer raise this exception in either circumstances.

Expected behavior

No response

Steps to reproduce

Clone this repo https://github.com/Nintorac/dlt-merge-strategy-issue-repro

and run docker compose up

Operating system

Linux

Runtime environment

Local

Python version

3.10

dlt data source

No response

dlt destination

Filesystem & buckets

Other deployment details

No response

Additional information

No response

Nintorac avatar Jul 24 '24 10:07 Nintorac

@Nintorac thanks for taking the time to create the repo.

I cloned the repo and ran docker compose up: image

As you see in the screenshot, I get exec ./run.sh: no such file or directory.

Do I need to do anything else to make it work?

jorritsandbrink avatar Jul 24 '24 13:07 jorritsandbrink

Should be all there, I will double check that I can run it from a fresh clone when I am back at the computer.

But I do see the run.sh in the repo - https://github.com/Nintorac/dlt-merge-strategy-issue-repro/blob/main/run.sh

I see you are running windows, maybe the run.sh script is losing the execute bit permissions when you clone. You could try add run chmod +x run.sh after the copy . . line in the Docker file

Nintorac avatar Jul 24 '24 17:07 Nintorac

I think I understand what's going on here. I don't think it has anything to do with local versus s3. I think it has to do with dlt versions differences.

I noticed that behavior in dlt==0.5.1 is different than dlt==0.5.2a2.

Setting write_disposition="merge" will succeed on both versions:

import dlt
from dlt.destinations import filesystem

assert dlt.__version__ in ("0.5.1", "0.5.2a2")

pipeline = dlt.pipeline(
    pipeline_name="my_pipeline",
    destination=filesystem(bucket_url="file://_storage"),
)

pipeline.run(
    [{"foo": 1}], 
    table_name="my_table",
    write_disposition="merge",
    # write_disposition={"disposition": "merge", "strategy": "scd2"},
)

print(
    "I ran without errors, because I silently ignored the `merge` write"
    " disposition and used `append` instead."
)

Setting write_disposition={"disposition": "merge", "strategy": "scd2"} will succeed on 0.5.1, but fail on 0.5.2a2:

import dlt
from dlt.destinations import filesystem

assert dlt.__version__ == "0.5.2a2"

pipeline = dlt.pipeline(
    pipeline_name="my_pipeline",
    destination=filesystem(bucket_url="file://_storage"),
)

pipeline.run(
    [{"foo": 1}], 
    table_name="my_table",
    # write_disposition="merge",
    write_disposition={"disposition": "merge", "strategy": "scd2"},
)

# dlt.common.destination.exceptions.DestinationCapabilitiesException: `scd2` merge strategy not supported for `filesystem` destination.

@Nintorac could it be you have been using different dlt versions?

jorritsandbrink avatar Jul 30 '24 11:07 jorritsandbrink

Maybe, but I don't think so. I am using poetry and it appears to ignore the pre-release versions.

You also mentioned an exception in 0.5.1 as well, right?

Nintorac avatar Jul 31 '24 04:07 Nintorac

@Nintorac I did mention that, but it was an incorrect statement. The check that raises that exception has been introduced after 0.5.1.

I ran dlt --version back then to check my version and it showed 0.5.1, but that must have been for the dlt package I had installed in my global Python env, not the env I used to run the code that threw the exception.

In any case, the version difference is the only thing we're able to reproduce. The difference can also be explained: upsert support for the filesystem destination (when using delta table format) was added after version 0.5.1. This new feature comes with a new check on supported merge strategies, which explains why dlt.common.destination.exceptions.DestinationCapabilitiesException: scd2 merge strategy not supported for filesystem destination. is raised on 0.5.2a2 but not on 0.5.1.

jorritsandbrink avatar Jul 31 '24 08:07 jorritsandbrink