KnightChess comments

Results 70 comments of


                                            KnightChess

[WIP][HUDI-6472] fix spark sql does not ignore case

Sorry for the late reply. @jonvex I will close this pr, thank you work for it.

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

@bk-mz yes, mor not support parquet native bloom filter, because log file will merge on read, so native bloom filter is not the latest, is not accurate, only `cow` or...

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

@bk-mz yes, `set hoodie.datasource.query.type = read_optimized`

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

@bk-mz the cache of the operating system may also have an impact, can you provide detailed metrics for spark ui?

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

@bk-mz you can see scan rdd `the number of output rows` in spark sql tag ui.

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

@bk-mz can you see the cost time in this point?

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

![image](https://github.com/apache/hudi/assets/20125927/2dd2b745-96b2-464d-8541-1119197bed48)

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

we can only analyse the scan rdd. A query contains time consumption in various aspects. the result I think is normal.

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

@bk-mz yes, according to the indicators, it is work

[SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time

There will be a variety of factor leading to the difference time in the query, like IO、cpu、dick load... in spark, like parallelism, the expand time of executor..., in hudi, snapshot...