DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Generalization of the value parameter in the unstack function

Open sprmnt21 opened this issue 2 years ago • 1 comments

I do not think it is a very felt need, but if it were not too complicated and at risk of unwanted side effects, a generalization of the value parameter could be useful. Extending it to the case of multiple columns. A fictional example follows to give an idea of what it might do.

df = DataFrame(x1=rand('a':'d', 100), x2=rand('x':'z', 100),x3=rand('X':'Z', 100),x4=rand('X':'Z', 100))
transform!(df, [:x3,:x4]=>(.==)=>:eq)
udf=unstack(df, :x1,:x2,:eq, valuestransform=x->sum((.!)(x)))


#The extended syntax should allow such an expression to have the same result as above.


_udf=unstack(df, :x1,:x2, [:x3,:x4], valuestransform=(x,y)->sum(x.!=y))_

sprmnt21 avatar Jun 03 '22 20:06 sprmnt21

This is on a to-do list (the design would need to be confirmed) along with https://github.com/JuliaData/DataFrames.jl/issues/1839. I will keep this open for the future (it will not go into 1.4 release).

bkamins avatar Jun 03 '22 21:06 bkamins

I am closing this in favor of https://github.com/JuliaData/DataFrames.jl/issues/3237 (to have a single place to discuss all related issues)

bkamins avatar Dec 05 '22 11:12 bkamins