dlt
dlt copied to clipboard
creates a single source in extract for all resource instances passed as list
Description
We discovered peculiar problem with rest_api
when users were passing a list of resources to the run
function that contained a resource and a transformer:
page = 1
@dlt.resource(name="pages")
def gen_pages():
nonlocal page
while True:
yield {"page": page}
if page == 10:
return
page += 1
@dlt.transformer(name="subpages")
def get_subpages(page_item):
yield from [
{
"page": page_item["page"],
"subpage": subpage,
}
for subpage in range(1, 11)
]
pipeline = dlt.pipeline("test_resource_transformer_standalone", destination="duckdb")
# here we must combine resources and transformers using the same instance
info = pipeline.run([gen_pages, gen_pages | get_subpages])
in the case above only last page is passed to the transformer (see the commits for tests and details)
the root cause is that each resource in the list was packaged in a separate source and extracted separately. that prevented any DAG optimizations and gen_pages
was extracted twice.
here we change the behavior where a single dlt source is used to extract all the resources in the list