dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[DF] Optimize away DISTINCT

Open jdye64 opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe. PyTest test_compatibility::test_agg_count throws an error ValueError: Arrow DataFusion should optimize them away! which is unexpected and should be investigated.

Describe the solution you'd like Aggregations should work and this error should not be occurring when running the PyTests.

Describe alternatives you've considered None

Additional context None

jdye64 avatar May 16 '22 14:05 jdye64

Not sure if someone is working on this right now, but I don't see a ValueError anymore. It looks like COUNT(DISTINCT x) incorrectly produces the exact same output as COUNT(x), but I'm not sure why.

sarahyurick avatar Aug 05 '22 17:08 sarahyurick

Hey Sarah. Yeah I'm working on it. I'm creating a new optimization rules to adjust the output logicalplan. Hope to have it finished next week sometime

jdye64 avatar Aug 05 '22 18:08 jdye64

Closing this PR should also resolve test_agg_count_no_group_by, which fails in the same way

charlesbluca avatar Aug 10 '22 16:08 charlesbluca