JuliaDB.jl icon indicating copy to clipboard operation
JuliaDB.jl copied to clipboard

groupby operations slower on JuliaDB compared to DataFrames

Open deepaksuresh opened this issue 5 years ago • 0 comments
trafficstars

On the dataset defined here, a group operation is extremely slow on JuliaDB compared to DataFrames.

The following benchmark was done on dataset of size N=1e8 from the link above.

Grouping by one column and calculating sum along another

On JuliaDB

@btime groupby(sum, df, :id1, select=:v1);                                        
  6.908 s (1710 allocations: 1.68 GiB)  

On DataFrames

@btime combine(groupby(df, :id1), :v1=>sum)
  743.827 ms (222 allocations: 762.96 MiB)

This was on Julia 1.4, DataFrames v0.21.0, and JuliaDB v0.13.0

deepaksuresh avatar May 17 '20 08:05 deepaksuresh