collapse benchmark feedback

trafficstars

As I don't have twitter I would like to leave feedback about benchmark here. I strongly advise to scale up benchmark so single time measurement is above 1s. Having timings in milliseconds is not well stressing algorithms as it is more affected by overheads. I am not saying to ignore overheads, but to just give bigger picture about scalability. This is mentioned in https://cran.r-project.org/web/packages/data.table/vignettes/datatable-benchmarking.html as well, in case you haven't seen this doc. Warm regards

Aug 04 '22 20:08 jangorecki

Thanks Jan, in principle I agree with you, I just ran this on the NCY Taxi dataset used a lot by people presenting arrow at various conferences this year and also on Twitter. It is hard to get execution times longer than 1 second on that one. Will soon add some modifications where I replicate the data a bit to reach longer execution times.

By the way, in my experience arrow is considerably faster than dplyr in basically all constellations, which curiously is not reflected in the data.table benchmarks (https://h2oai.github.io/db-benchmark/).

Aug 05 '22 21:08 SebKrantz

You might have been using different version of arrow than the one in db-benchmark

Aug 06 '22 12:08 jangorecki

Just one of many cases like this, very recent https://stackoverflow.com/questions/73403038/r-data-table-rolling-max#comment129672489_73408459

Aug 21 '22 13:08 jangorecki

By the way, in my experience arrow is considerably faster than dplyr in basically all constellations, which curiously is not reflected in the data.table benchmarks (https://h2oai.github.io/db-benchmark/).

The H2O benchmarks are out of date by now.

Please see the latest benchmarks now published by DuckDB.

Arrow is indeed faster than dplyr.

Sep 15 '23 10:09 mutahiwachira

collapse has entered the DuckDB benchmarks, thus I also regard this issue as resolved.

Oct 31 '23 13:10 SebKrantz

collapse collapse copied to clipboard

benchmark feedback

collapse
collapse copied to clipboard