hudi
hudi copied to clipboard
[HUDI-4584][Stacked on 6351] Fixing `SQLConf` not being propagated to executor
Change Logs
SQLConf by default isn't propagated to Executors from the Driver. This is the reason why configuration specified w/ --conf is not being respected by some components being invoked on the executor (see https://github.com/apache/hudi/issues/6278)
In Spark this has been remediated by the introduction of SQLExecutionRDD that does snapshot SQLConf subsequently overriding it on the Executor side. However this has following limitations:
SQLExecutionRDDdoes propagateSQLConfonly for RDDs that are preceding it (ie its dependencies), and doesn't for the ones that are chained after it.SQLExecutionRDDis only applicable toRDD[InternalRow]
To work around #2 we're introduced SQLConfInjectingRDD (which is just a generalization of SQLExecutionRDD) and make sure that it's properly injected (to address #1) inside, for ex, HoodieSparkUtils.createRdd invocation (that is being carried out on the Executor side)
Impact
None
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
CI report:
- 285ed43d8ca7b525c3bd5334a0acd19ec2c7757c Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build