db-benchmark icon indicating copy to clipboard operation
db-benchmark copied to clipboard

Dask: Enable Q6 to Q10 again

Open fjetter opened this issue 1 year ago • 5 comments

This does not include any changes to the existing code (with the exception of the removed config option that is ignored by dataframes anyhow) but is simply adding the Q7-Q10 queries again as they are defined right now. I haven't optimized anything here but dask is perfectly capable of running those.

fjetter avatar Nov 07 '23 10:11 fjetter

Hi Florian,

I've kicked off the workflow run for now to make sure everything works. To get the PR merged and the results updated quickly I would also like to see updates to the time.csv and logs.csv files. This way I know the code has been tested thoroughly up to 50GB. By running the benchmark yourself you can also generate the report to see how dask compares to other solutions.

You can also modify configs so that dask spawns some different combination of workers & threads. I saw on this comment that that might be the issue https://github.com/duckdblabs/db-benchmark/issues/56#issuecomment-1798255468

Tmonster avatar Nov 08 '23 13:11 Tmonster

I wanted to follow up with some improvements but we can of course all do in one go.

To get the PR merged and the results updated quickly I would also like to see updates to the time.csv and logs.csv files.

Looks like I didn't read the readme properly. I assumed you would be running the benchmarks. I'll look into it and update the numbers.

fjetter avatar Nov 08 '23 13:11 fjetter

Whoops, looks like dask isn't even a solution in the regression.yml file. Can you add it in this PR? Then it will get automatically tested as well.

Edit: I manually cancelled earlier the workflows since dask wasn't included. They should automatically run again when you push

Tmonster avatar Nov 08 '23 13:11 Tmonster

@fjetter seems like there is an issue with the dask group by

Tmonster avatar Nov 09 '23 09:11 Tmonster

Hi Florian,

I did some extra debugging here and found other changes that needed to be made to get dask to run. If you merge with master all github actions should pass

Tmonster avatar Nov 13 '23 09:11 Tmonster

@fjetter Hi florian, with the release of DuckDB v1.0.0 I'm gonna run the benchmark again. I tried to resolve the merge conflicts for Dask. Let me know if there's anything else I need to do

Currently waiting for CI to pass first

Tmonster avatar Jun 04 '24 11:06 Tmonster

If CI passes I think you're good.

fjetter avatar Jun 04 '24 12:06 fjetter