kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

Partition Dataset Overwrite not working as expected

Open fazilhero opened this issue 2 years ago • 3 comments

Description

Im trying to use the PartitionedDataset with overwrite parameter set to True but it overwrites a completely different partition.

Context

I have the following partitions in my file storage(s3):

"2023-08-01/se/1/orders" "2023-08-01/se/3/orders" "2023-08-01/se/2/orders"

When I have a function which process a single partition like "2023-08-01/se/1/orders" and tries to save this back with overwrite set to True, it removes all the other partitions.

Suspected error is here with recursive param I believe: https://github.com/kedro-org/kedro/blob/main/kedro/io/partitioned_dataset.py#L308

Steps to Reproduce

Expected Result

Before Save: "2023-08-01/se/1/orders" "2023-08-01/se/3/orders" "2023-08-01/se/2/orders"

After Save: "2023-08-01/se/1/orders" "2023-08-01/se/3/orders" "2023-08-01/se/2/orders"

Actual Result

Before Save: "2023-08-01/se/1/orders" "2023-08-01/se/3/orders" "2023-08-01/se/2/orders"

After Save: "2023-08-01/se/1/orders"

  • Kedro version used (pip show kedro or kedro -V): 0.18.11
  • Python version used (python -V): 3.10
  • Operating system and version: Ubuntu 18.04

fazilhero avatar Aug 02 '23 16:08 fazilhero

@fazilhero This is the expected behavior - https://docs.kedro.org/en/stable/kedro.io.PartitionedDataset.html#kedro.io.PartitionedDataset

overwrite – If True, any existing partitions will be removed.

Could you elaborate your use case in terms of what you are trying to do? We just have a discussion about https://github.com/kedro-org/kedro/issues/2857 to support versioning of PartitionedDataset, are you trying to overwrite partitions partially, or is that true that versioning of PartitionedDataset is actually what you want?

noklam avatar Aug 03 '23 13:08 noklam

What I understood is that when i write a partition "2023-08-01/se/1/orders", it should either overwrite that partition or throw error. I was a bit confused when it was deleting different partitions such as "2023-08-01/se/2/orders" I now realize doc says any partition, I suppose you really wanted to delete all partitions.

fazilhero avatar Aug 03 '23 21:08 fazilhero

Sorry for the confusion, I think what you make sense. @stichbury

noklam avatar Aug 03 '23 21:08 noklam

Closed as this is documented expected behavior. If this is a desired feature feel free to open a separate issue for feature request or submit a PR

noklam avatar Jul 16 '24 14:07 noklam