hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-4584][Stacked on 6351] Fixing `SQLConf` not being propagated to executor

Open alexeykudinkin opened this issue 2 years ago • 1 comments

Change Logs

SQLConf by default isn't propagated to Executors from the Driver. This is the reason why configuration specified w/ --conf is not being respected by some components being invoked on the executor (see https://github.com/apache/hudi/issues/6278)

In Spark this has been remediated by the introduction of SQLExecutionRDD that does snapshot SQLConf subsequently overriding it on the Executor side. However this has following limitations:

  1. SQLExecutionRDD does propagate SQLConf only for RDDs that are preceding it (ie its dependencies), and doesn't for the ones that are chained after it.
  2. SQLExecutionRDD is only applicable to RDD[InternalRow]

To work around #2 we're introduced SQLConfInjectingRDD (which is just a generalization of SQLExecutionRDD) and make sure that it's properly injected (to address #1) inside, for ex, HoodieSparkUtils.createRdd invocation (that is being carried out on the Executor side)

Impact

None

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

alexeykudinkin avatar Aug 10 '22 02:08 alexeykudinkin

CI report:

  • 285ed43d8ca7b525c3bd5334a0acd19ec2c7757c Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Aug 23 '22 10:08 hudi-bot