hudi [HUDI-8036] Handle partition schema for custom key gen in SparkHoodieTableFileIndex

[HUDI-8036] Handle partition schema for custom key gen in SparkHoodieTableFileIndex

Open lokeshj1703 opened this issue 1 year ago • 1 comments

Change Logs

Currently the partition schema defined for table in SparkHoodieTableFileIndex does not handle the different partition types for the partition columns. These partition types are simple and timestamp for custom based keygen. The Jira aims to handle these partition types and reproduce the issue as mentioned in mentioned in https://github.com/apache/hudi/issues/8343.

Changes in the PR - Have a separate file index used for HoodieBaseRelation and snapshot, incremental etc. queries. This file index would use string type as the schema for timestamp partition columns. The logical plan for insert into, merge into and update table commands has to be changed now to replace the reader file index and use the original file index so that table schema does not change.

Impact

Risk level (write none, low medium or high below)

low

Documentation Update

Contributor's checklist

[ ] Read through contributor's guide
[ ] Change Logs and Impact were stated clearly
[ ] Adequate tests were added if applicable
[ ] CI passed

Jul 31 '24 11:07 lokeshj1703

CI report:

3f7cca22feffc4f468eb6e17d6fdabafc0b595c5 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

Sep 05 '24 18:09 hudi-bot

hudi hudi copied to clipboard

[HUDI-8036] Handle partition schema for custom key gen in SparkHoodieTableFileIndex

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

CI report:

hudi
hudi copied to clipboard