cookbook-rpolars icon indicating copy to clipboard operation
cookbook-rpolars copied to clipboard

Unfairness in benchmarks

Open Darxor opened this issue 1 year ago • 1 comments

Hello!

First, I'd like to say thanks for a great book and bringing knowledge about polars to R community. I do have a concern about benchmarks in "From an R object" section though.

Currently you are pre-initializing polars object before running your query, while not converting data.frame to data.table or to duckdb / arrow. https://github.com/ddotta/cookbook-rpolars/blob/e1374f9ea2ae89d177f175d61c3d22a290438cb5/book/content/benchmarking/_from_r_object.qmd#L14-L17

One could argue that DataMultiTypes_pl is not more of an R object than duckdb connection, as both are external references and can't be directly serialized to RDS. Creating a data.table object also takes additional time (albeit negligible compared to polars and duckdb).

So I propose either starting all benchmarks from base data.frame or pre-initializing all objects and connections.

In my testing I also uncovered the fact that polars has substantial initialization overhead, compared to duckdb, thus moving it down in ranks if initialization happens inside of the tested call.

Darxor avatar Aug 10 '23 12:08 Darxor

I agree. DuckDB has the ability to query directly against R data frames (and arrow Tables), but nothing else. So in general, DuckDB tends to be the fastest for querying R data frames.

eitsupi avatar Oct 09 '23 16:10 eitsupi