DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Add multithreading to transformations of AbstractDataFrame

Open bkamins opened this issue 2 years ago • 2 comments

The infrastructure is ready for this. We just need to decide when it is worth to do it, make, and document changes.

bkamins avatar Dec 09 '21 13:12 bkamins

Is this issue for parallelizing over rows or operations/columns? I'm currently doing

transform!(df, names(df, Union{Missing, AbstractString}) .=> inlinestrings; renamecols=false)

on a big df and it would be nice to parallelize it over the columns.

jariji avatar Jul 02 '23 06:07 jariji

Over columns - as you mention. Parallelization over rows would need to be done within the function you call (e.g. inlinestring in your case).

As a reference: this possibility of parallelization is the reason why we do not allow transform(df, :a => :b, :b => :c) if :b is not present in the data frame (because if it were allowed parallelization would not be possible).

bkamins avatar Jul 02 '23 09:07 bkamins