hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-5829] Optimize conversion from json to row format when sanitizing field names

Open vamsikarnika opened this issue 1 year ago • 1 comments

Change Logs

Currently when source data has to read in row format and sanitization is enabled, we first read the data in avro format(which supports sanitization) and later convert from avro to row. This new approach simplifies this process by directly converting from json to row while applying sanitization.

Impact

When source data has to be read in row format, and sanitization is enabled. This change should make the conversion from json to row faster by directly converting from json to row.

Risk level (write none, low medium or high below)

None

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

vamsikarnika avatar Aug 27 '24 12:08 vamsikarnika

CI report:

  • 273deff6a679117963009c0093d06e8379752a16 UNKNOWN
  • 0f1ea36efe07f0a9af5e84186ab0570adeb01906 Azure: FAILURE
  • df60320cceb9a44b595b37cda1af24200de25893 UNKNOWN
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Sep 04 '24 09:09 hudi-bot