hudi
hudi copied to clipboard
[HUDI-5829] Optimize conversion from json to row format when sanitizing field names
Change Logs
Currently when source data has to read in row format and sanitization is enabled, we first read the data in avro format(which supports sanitization) and later convert from avro to row. This new approach simplifies this process by directly converting from json to row while applying sanitization.
Impact
When source data has to be read in row format, and sanitization is enabled. This change should make the conversion from json to row faster by directly converting from json to row.
Risk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
- The config description must be updated if new configs are added or the default value of the configs are changed
- Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
CI report:
- 273deff6a679117963009c0093d06e8379752a16 UNKNOWN
- 0f1ea36efe07f0a9af5e84186ab0570adeb01906 Azure: FAILURE
- df60320cceb9a44b595b37cda1af24200de25893 UNKNOWN
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build