KnightChess comments

Results 70 comments of


                                            KnightChess

[HUDI-6207] spark support bucket index query for table with bucket index

@hudi-bot run azure

[HUDI-6207] spark support bucket index query for table with bucket index

@hudi-bot run azure

[HUDI-6207] spark support bucket index query for table with bucket index

@nsivabalan @danny0405 @yihua hi, ci all sucess, can you help revie it.

[HUDI-6207] spark support bucket index query for table with bucket index

@danny0405 yes, is ready for review

[HUDI-6207] spark support bucket index query for table with bucket index

@hudi-bot run azure

org.apache.spark.sql.execution.datasources.Spark34NestedSchemaPruning

@smileyboy2019 you need point profiles `spark3.4` pom.xml in idea

hoodie.bulkinsert.shuffle.parallelism Not activated

@zhangjw123321 look like `hoodie.bulkinsert.shuffle.parallelism` can not work on non-partitioned table in the code. In the spark ui, may be you not set `spark.default.parallelism` so `reduceBykey` will use the parent rdd...

hoodie.bulkinsert.shuffle.parallelism Not activated

@zhangjw123321 I test in my local, `spark.default.parallelism` look like can not effect in sql set, can you set when submit spark job, like --conf. before try it, how much cores...

hoodie.bulkinsert.shuffle.parallelism Not activated

@zhangjw123321 you can try set it in spark submit, --conf, or by code sparkconf.set('xxx','yyy'), will match other branch, not use parent rdd partition size ![image](https://github.com/apache/hudi/assets/20125927/4b21cb55-3bd6-471e-92d4-e3dade5eafaf)

hoodie.bulkinsert.shuffle.parallelism Not activated

@zhangjw123321 I create a issue to track it, https://issues.apache.org/jira/browse/HUDI-7277