incubator-gluten
incubator-gluten copied to clipboard
[VL] Result mismatch issues Tracker
Backend
VL (Velox)
Bug description
There are several data mismatch issues either related with operator or functions. Some of the fixes are landed in Gluten, and some are in Velox repo. We will use this issue to track the status as these are critical for production envs.
- [x] complex datatype return wrong value, disabled in Gluten now
- [x] parquet scan + filter pushdown wrongly return "", should return null.
- [x] distinct hash agg + spill returned duplicated keys.
- [x] max_by function return wrong result
- [x] sortBeforeRepartition
- [ ] cast(sum(decmial(20,4)), float)
- [x] get_json_object({"dScore":0.0215434648799772}, "$.dScore") (Fixed in Velox)
- [ ] cast(string as bigint)
- [x] cast(double as decimal)
- [ ] array_size(null) (issue: https://github.com/apache/incubator-gluten/issues/5248)
- [x] round(avg(cast(col as double)), 4) #5366
- [x] isNull and isNotNull in filter condition #5670
- [x] from_unixtime with overflowed (Fixed by https://github.com/facebookincubator/velox/pull/9836)
- [ ] date_format (https://github.com/apache/incubator-gluten/issues/5524)
- [x] cast integer as binary (https://github.com/apache/incubator-gluten/issues/5073)
- [ ] regexp_replace('a{bc', '\{', '\[') (#6224)
- [ ] LEGACY timeParserPolicy (#6227)
- [ ] FlushableAgg (#6630)
- [ ] if expression (#6673)
- [ ] weekOfYear (#6784)
- [ ] round (#6827)
- [ ] cast string as date (#6828)
- [ ] nested decimal arithmetic expressions (#7082)
- [ ] date format week year (#7069)
- [ ] Large timestamp outside of range (#7109)
- [ ] Aggregate window gets the wrong result (#7194)
- [x] Diff of in_or_and (#7362)
- [ ] hash agg output wrong result (#7494)
#4678 issue in hashagg
https://github.com/oap-project/gluten/issues/4587
Currently we disabled all complex data read
https://github.com/oap-project/gluten/pull/4818
https://github.com/oap-project/gluten/pull/4872
https://github.com/apache/incubator-gluten/issues/4891
https://github.com/apache/incubator-gluten/issues/4928
https://github.com/apache/incubator-gluten/issues/4930
https://github.com/apache/incubator-gluten/issues/4947
3 issues we met:
- parquet scan + filter pushdown wrongly return "", should return null. Fixed by https://github.com/facebookincubator/velox/pull/9129
- distinct hash agg + spill returned duplicated keys. https://github.com/facebookincubator/velox/issues/9219
- max_by function return wrong result
- distinct hash agg + spill returned duplicated keys.
@FelixYBW Has this issue not been fixed by https://github.com/apache/incubator-gluten/pull/4443 ?
@FelixYBW Has this issue not been fixed by #4443 ?
No, it's tested from main branch. A new issue
No, it's tested from main branch. A new issue
https://github.com/facebookincubator/velox/issues/9219
- max_by function return wrong result
@yma11 Did you submit a fix to the issue?
- max_by function return wrong result
@yma11 Did you submit a fix to the issue?
Not yet. Only have pushed to golden branch and will submit one in Velox upstream.
#5253
#5253
Looks the issue of get_json_object. @PHILO-HE maybe we need a fully tests of json functions, like the re2.
#5253
Looks the issue of get_json_object. @PHILO-HE maybe we need a fully tests of json functions, like the re2.
@FelixYBW, I will do that. Thanks!
https://github.com/apache/incubator-gluten/issues/5248
https://github.com/apache/incubator-gluten/issues/5366
#5366
UPdated desc. thank you. do you know which function (cast, avg, round ) caused the issue?
#5372
- max_by function return wrong result
@yma11 Did you submit a fix to the issue?
Not yet. Only have pushed to golden branch and will submit one in Velox upstream.
@FelixYBW This fix should be done at cpp side. The formal fix is in PR. Can you help review it?
@FelixYBW This fix should be done at cpp side. The formal fix is in PR. Can you help review it?
Is it a Gluten issue? I'd think veox has some bug here.
@FelixYBW This fix should be done at cpp side. The formal fix is in PR. Can you help review it?
Is it a Gluten issue? I'd think veox has some bug here.
Yes. I think so. It's caused by the additional projects we added before/after shuffle. The logic of partial/final handle in Velox upstream has no problem. The ideal way is to add struct support for shuffle in Gluten so that we can remove the hack.
@PHILO-HE Any update of the issues here?
https://github.com/apache/incubator-gluten/issues/5682
#5701
@PHILO-HE Any update of the issues here?
@FelixYBW, Some were actually fixed. Just updated the list. Will fix or seek help to fix other issues.