big_data_benchmarks icon indicating copy to clipboard operation
big_data_benchmarks copied to clipboard

Only aggregations

Open maartenbreddels opened this issue 5 years ago • 1 comments

This is a bit of a better comparison and makes dask run. Instead of materializing a column, we nog aggregate (take the mean). And we don't ask dask to materialize the filtered dataframe. These are my result using vaex-hdf5 (parquet is much slower): image

And vaex: image

maartenbreddels avatar Jan 23 '20 19:01 maartenbreddels

With parquet is slightly faster, but I cannot run the filtered part

image

maartenbreddels avatar Jan 23 '20 20:01 maartenbreddels