hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7431] Add replication and block size to StoragePathInfo to be backwards compatible

Open yihua opened this issue 1 year ago • 1 comments

Change Logs

This PR adds the replication and block size information to StoragePathInfo so that it is backward compatible for generating FileStatus from StoragePathInfo and Hive's FileInputFormat to properly generate splits based on the block size. Hive's relevant logic is mentioned below. Without this change, the replication and block size information are dropped; Hive's input format generates a huge number of splits with size 1, without block size (0), causing performance regression.

Screenshot 2024-02-20 at 12 15 53

This fixes the test issue in the integration of HoodieStorage abstraction, which can be found in #10591.

Impact

Fixes backward compatibility in HoodieStorage abstraction.

Risk level

low

Documentation Update

N/A

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

yihua avatar Feb 20 '24 22:02 yihua

CI report:

  • 1ccf94bbcef53d3c4b3d14a3953b432f698e52d3 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Feb 27 '24 22:02 hudi-bot