kedro-plugins
kedro-plugins copied to clipboard
Partition Dataset Overwrite not working as expected
Description
Im trying to use the PartitionedDataset with overwrite parameter set to True but it overwrites a completely different partition.
Context
I have the following partitions in my file storage(s3):
"2023-08-01/se/1/orders" "2023-08-01/se/3/orders" "2023-08-01/se/2/orders"
When I have a function which process a single partition like "2023-08-01/se/1/orders" and tries to save this back with overwrite set to True, it removes all the other partitions.
Suspected error is here with recursive param I believe: https://github.com/kedro-org/kedro/blob/main/kedro/io/partitioned_dataset.py#L308
Steps to Reproduce
Expected Result
Before Save: "2023-08-01/se/1/orders" "2023-08-01/se/3/orders" "2023-08-01/se/2/orders"
After Save: "2023-08-01/se/1/orders" "2023-08-01/se/3/orders" "2023-08-01/se/2/orders"
Actual Result
Before Save: "2023-08-01/se/1/orders" "2023-08-01/se/3/orders" "2023-08-01/se/2/orders"
After Save: "2023-08-01/se/1/orders"
- Kedro version used (
pip show kedroorkedro -V): 0.18.11 - Python version used (
python -V): 3.10 - Operating system and version: Ubuntu 18.04
@fazilhero This is the expected behavior - https://docs.kedro.org/en/stable/kedro.io.PartitionedDataset.html#kedro.io.PartitionedDataset
overwrite – If True, any existing partitions will be removed.
Could you elaborate your use case in terms of what you are trying to do? We just have a discussion about https://github.com/kedro-org/kedro/issues/2857 to support versioning of PartitionedDataset, are you trying to overwrite partitions partially, or is that true that versioning of PartitionedDataset is actually what you want?
What I understood is that when i write a partition "2023-08-01/se/1/orders", it should either overwrite that partition or throw error. I was a bit confused when it was deleting different partitions such as "2023-08-01/se/2/orders" I now realize doc says any partition, I suppose you really wanted to delete all partitions.
Sorry for the confusion, I think what you make sense. @stichbury
Closed as this is documented expected behavior. If this is a desired feature feel free to open a separate issue for feature request or submit a PR