hudi
hudi copied to clipboard
[HUDI-7267]fix dataSkipping with null column stats
from the picture, csi will use parquet chunk block meta calculate min/max value, and save it to mdt col stat. For complex cols, such as info array<struct<name: string, age: int>> , parquet meta will contain only info.array.name
, infor.array.age
, but hudi will only calculate info
column, so this meta in mdt will be null.
And if sql expression contain IsNotNull(info)
, the file will all be skip.
And consider common cols, which will be add in the future and old file will not contain this col, may cause some other question. So, make code logical clean, Check for null before evaluating the value:min/mav/nullValue.
Change Logs
- Check for null before evaluating the value:min/mav/nullValue
Impact
None
Risk level (write none, low medium or high below)
low
Documentation Update
None
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
I see related changes: https://github.com/apache/hudi/pull/10389
CI report:
- d07bc703721ad554a2ada4c0da1697eb7bd1a996 Azure: CANCELED
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build
I see related changes: #10389
look like met the same problem, close this issue, @danny0405 thanks