DataFrames.jl
DataFrames.jl copied to clipboard
Generalization of the value parameter in the unstack function
I do not think it is a very felt need, but if it were not too complicated and at risk of unwanted side effects, a generalization of the value parameter could be useful. Extending it to the case of multiple columns. A fictional example follows to give an idea of what it might do.
df = DataFrame(x1=rand('a':'d', 100), x2=rand('x':'z', 100),x3=rand('X':'Z', 100),x4=rand('X':'Z', 100))
transform!(df, [:x3,:x4]=>(.==)=>:eq)
udf=unstack(df, :x1,:x2,:eq, valuestransform=x->sum((.!)(x)))
#The extended syntax should allow such an expression to have the same result as above.
_udf=unstack(df, :x1,:x2, [:x3,:x4], valuestransform=(x,y)->sum(x.!=y))_
This is on a to-do list (the design would need to be confirmed) along with https://github.com/JuliaData/DataFrames.jl/issues/1839. I will keep this open for the future (it will not go into 1.4 release).
I am closing this in favor of https://github.com/JuliaData/DataFrames.jl/issues/3237 (to have a single place to discuss all related issues)