dlt icon indicating copy to clipboard operation
dlt copied to clipboard

creates a single source in extract for all resource instances passed as list

Open rudolfix opened this issue 1 year ago • 3 comments

Description

We discovered peculiar problem with rest_api when users were passing a list of resources to the run function that contained a resource and a transformer:

     page = 1

    @dlt.resource(name="pages")
    def gen_pages():
        nonlocal page
        while True:
            yield {"page": page}
            if page == 10:
                return
            page += 1

    @dlt.transformer(name="subpages")
    def get_subpages(page_item):
        yield from [
            {
                "page": page_item["page"],
                "subpage": subpage,
            }
            for subpage in range(1, 11)
        ]

    pipeline = dlt.pipeline("test_resource_transformer_standalone", destination="duckdb")
    # here we must combine resources and transformers using the same instance
    info = pipeline.run([gen_pages, gen_pages | get_subpages])

in the case above only last page is passed to the transformer (see the commits for tests and details) the root cause is that each resource in the list was packaged in a separate source and extracted separately. that prevented any DAG optimizations and gen_pages was extracted twice.

here we change the behavior where a single dlt source is used to extract all the resources in the list

rudolfix avatar Jul 02 '24 17:07 rudolfix

Deploy Preview for dlt-hub-docs canceled.

Name Link
Latest commit 1f26f722212d2b894a73b5d1ff290fd8cc071564
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66e494729541520008ee968d

netlify[bot] avatar Jul 02 '24 17:07 netlify[bot]

@sh-rp I'll add this to our release notes as one of the changes

rudolfix avatar Jul 03 '24 12:07 rudolfix

@sh-rp also you are partially right with the parallelism! we have tests that are passing a list of many resources with the same names. and those tests are failing. We'd need to package them in separate sources and execute them one by one

rudolfix avatar Jul 03 '24 12:07 rudolfix