kedro icon indicating copy to clipboard operation
kedro copied to clipboard

Nodes with same output dataset for Partitioned Scenarios

Open mehrzadai opened this issue 1 year ago • 1 comments

I faced an issue that may be solved in the future or have any solution available that I don't know. I have a scenario in which I have different categories of big data e.g. rates, sales, views, and reviews and I want to join them together. I don't want to have different datasets for each in my catalog, instead, I want to save each as one partition, something like this :

concat:
   type : Partitioned
node(views -> concat) , node(rates -> concat) , ...

In this way, I can use connectivity and lazy save/load in the same time. But currently, the rule is : kedro.pipeline.pipeline.OutputNotUniqueError: Output(s) ['concat'] are returned by more than one nodes. Node outputsmust be unique. I can save my partitions like :

rates:
   type : CSVDataset
views:
   type : CSVDatset
 ...

and load the partitioned dataset in another node, but in this way, I will lose the connectivity of my nodes. I guess this rule is better to be changed for partitioned datasets to be able to save each partition in a different node.

mehrzadai avatar Dec 20 '23 08:12 mehrzadai