hudi
hudi copied to clipboard
[HUDI-4584][Stacked on 6351] Fixing `SQLConf` not being propagated to executor
Change Logs
SQLConf
by default isn't propagated to Executors from the Driver. This is the reason why configuration specified w/ --conf
is not being respected by some components being invoked on the executor (see https://github.com/apache/hudi/issues/6278)
In Spark this has been remediated by the introduction of SQLExecutionRDD
that does snapshot SQLConf
subsequently overriding it on the Executor side. However this has following limitations:
-
SQLExecutionRDD
does propagateSQLConf
only for RDDs that are preceding it (ie its dependencies), and doesn't for the ones that are chained after it. -
SQLExecutionRDD
is only applicable toRDD[InternalRow]
To work around #2 we're introduced SQLConfInjectingRDD
(which is just a generalization of SQLExecutionRDD
) and make sure that it's properly injected (to address #1) inside, for ex, HoodieSparkUtils.createRdd
invocation (that is being carried out on the Executor side)
Impact
None
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
CI report:
- 285ed43d8ca7b525c3bd5334a0acd19ec2c7757c Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build