cookbook-rpolars icon indicating copy to clipboard operation
cookbook-rpolars copied to clipboard

inefficient data.table code in benchmarks

Open BenoitLondon opened this issue 1 year ago • 0 comments

the data.table code is a bit unfair...

in the first code,

robject_dt <- function() {
  
  as.data.table(DataMultiTypes)[
    
    colInt > 2000 & colInt < 8000
    
  ][, .(min_colInt = min(colInt),
        mean_colInt = mean(colInt),
        mas_colInt = max(colInt),
        min_colNum = min(colNum),
        mean_colNum = mean(colNum),
        max_colNum = max(colNum)),
    
    by = colString
  ]
}

as.data.table does a full copy of the data and to make a fair comparison with polars you could build the data.table before hand, data.table gets closer to dplyr in my benchmark

In the csv example you do not need as.data.table as fread returns a data.table and then data.table method gets 2.5x faster than dplyr (on my machine with 10 threads for data.table) and probably beats polars(eager)

I could not run the polars code as it was throwing errors like

syntax error: days is not a method/attribute of the class RPolarsExprDTNameSpace 
       when calling method:
       (pl$col("colDate2") - pl$col("colDate1"))$dt$days

BenoitLondon avatar Jan 28 '24 03:01 BenoitLondon