dask-sql
dask-sql copied to clipboard
[DF] Optimize away DISTINCT
Is your feature request related to a problem? Please describe.
PyTest test_compatibility::test_agg_count
throws an error ValueError: Arrow DataFusion should optimize them away!
which is unexpected and should be investigated.
Describe the solution you'd like Aggregations should work and this error should not be occurring when running the PyTests.
Describe alternatives you've considered None
Additional context None
Not sure if someone is working on this right now, but I don't see a ValueError
anymore. It looks like COUNT(DISTINCT x)
incorrectly produces the exact same output as COUNT(x)
, but I'm not sure why.
Hey Sarah. Yeah I'm working on it. I'm creating a new optimization rules to adjust the output logicalplan. Hope to have it finished next week sometime
Closing this PR should also resolve test_agg_count_no_group_by
, which fails in the same way