spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-46714][SQL] Overwrite a partition with custom location

Open adrian-wang opened this issue 1 year ago • 5 comments

What changes were proposed in this pull request?

Sometimes we use more than one filesystems for data warehouse, for example one for hot/warm data and another for cold data, with different storages to save total cost. But it seems after spark convert table writing into data source writing, it is not working as expected.

Before this patch, when overwrite a partition with custom location:

  1. if the partition location is on same filesystem with its table, the partition location remain the same.
  2. else, spark will throw an exception java.lang.IllegalArgumentException: Wrong FS: After this patch, the behavior will align with Hive: the overwritten partition will be recreated under table location.

Why are the changes needed?

  1. to align behavior with Hive
  2. support existing partitions on a separate filesystem from table location.

Does this PR introduce any user-facing change?

Yes. Before this patch, when overwrite a partition with custom location:

  1. if the partition location is on same filesystem with its table, the partition location remain the same.
  2. else, spark will throw an exception java.io.IOException: Wrong FS ... After this patch, the behavior will align with Hive: the overwritten partition will be recreated under table location.

How was this patch tested?

Added a unit test case.

Was this patch authored or co-authored using generative AI tooling?

No.

adrian-wang avatar Jan 14 '24 14:01 adrian-wang

cc @cloud-fan FYI

LuciferYang avatar Jan 14 '24 15:01 LuciferYang

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar May 03 '24 00:05 github-actions[bot]

@LuciferYang can you please reopen this pull request?

adrian-wang avatar May 11 '24 06:05 adrian-wang

@adrian-wang done

LuciferYang avatar May 12 '24 10:05 LuciferYang

@cloud-fan Can you help review this pull request?

adrian-wang avatar Jun 06 '24 02:06 adrian-wang

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Sep 15 '24 00:09 github-actions[bot]