kedro
kedro copied to clipboard
Nodes with same output dataset for Partitioned Scenarios
I faced an issue that may be solved in the future or have any solution available that I don't know.
I have a scenario in which I have different categories of big data e.g. rates
, sales
, views
, and reviews
and I want to join them together.
I don't want to have different datasets for each in my catalog, instead, I want to save each as one partition, something like this :
concat:
type : Partitioned
node(views -> concat) , node(rates -> concat) , ...
In this way, I can use connectivity and lazy save/load in the same time.
But currently, the rule is :
kedro.pipeline.pipeline.OutputNotUniqueError: Output(s) ['concat'] are returned by more than one nodes. Node outputsmust be unique.
I can save my partitions like :
rates:
type : CSVDataset
views:
type : CSVDatset
...
and load the partitioned dataset in another node, but in this way, I will lose the connectivity of my nodes. I guess this rule is better to be changed for partitioned datasets to be able to save each partition in a different node.