cookbook-rpolars
cookbook-rpolars copied to clipboard
inefficient data.table code in benchmarks
the data.table code is a bit unfair...
in the first code,
robject_dt <- function() {
as.data.table(DataMultiTypes)[
colInt > 2000 & colInt < 8000
][, .(min_colInt = min(colInt),
mean_colInt = mean(colInt),
mas_colInt = max(colInt),
min_colNum = min(colNum),
mean_colNum = mean(colNum),
max_colNum = max(colNum)),
by = colString
]
}
as.data.table does a full copy of the data and to make a fair comparison with polars you could build the data.table before hand, data.table gets closer to dplyr in my benchmark
In the csv example you do not need as.data.table as fread returns a data.table and then data.table method gets 2.5x faster than dplyr (on my machine with 10 threads for data.table) and probably beats polars(eager)
I could not run the polars code as it was throwing errors like
syntax error: days is not a method/attribute of the class RPolarsExprDTNameSpace
when calling method:
(pl$col("colDate2") - pl$col("colDate1"))$dt$days