DataFrames.jl
DataFrames.jl copied to clipboard
Add multithreading to transformations of AbstractDataFrame
The infrastructure is ready for this. We just need to decide when it is worth to do it, make, and document changes.
Is this issue for parallelizing over rows or operations/columns? I'm currently doing
transform!(df, names(df, Union{Missing, AbstractString}) .=> inlinestrings; renamecols=false)
on a big df and it would be nice to parallelize it over the columns.
Over columns - as you mention. Parallelization over rows would need to be done within the function you call (e.g. inlinestring
in your case).
As a reference: this possibility of parallelization is the reason why we do not allow transform(df, :a => :b, :b => :c)
if :b
is not present in the data frame (because if it were allowed parallelization would not be possible).