dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[DF] select * limit 5 seems does a full scan

Open randerzander opened this issue 3 years ago • 0 comments

I'm struggling to find a programmatic reproducer for this, but on the datafusion-sql-planner branch:

c.sql("SELECT * FROM large_table limit 5")

results in reading the entire dataset before filtering at the end, instead of reading from a single partition.

Less reproducible, but from the daily weather data:

res = c.sql("select * from weather limit 5")
io_layer = list(res.dask.layers.keys())[0]
partitions = len(list(res.dask.layers[io_layer].keys()))
partitions
445

If I understand layers correctly, my select w/ limit statement is reading all partitions in the dataset.

randerzander avatar Aug 12 '22 17:08 randerzander