DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Feature request: Pairs in stack

Open pdeffebach opened this issue 1 year ago • 2 comments

This is a small feature that might save me some typing (that is, feel free to ignore)

stack is a nice function. But I'm finding that frequently, I want to stack right before making a table or a plot. And variable names aren't very conducive to pretty plots and tables.

julia> d = DataFrame(a_1 = [1, 2], a_2 = [3, 4]);

julia> stack(d, [:a_1, :a_2])
4×2 DataFrame
 Row │ variable  value 
     │ String    Int64 
─────┼─────────────────
   1 │ a_1           1
   2 │ a_1           2
   3 │ a_2           3
   4 │ a_2           4

I end up having to do some sort of re-labeling after I stack, something clunky like as follows

    t = @chain df begin
        @rsubset :has_job == 0
        @rsubset !ismissing(:rej_value)
        stack(
            [:nojob_value,
            :rej_value_same_earn_nojob,
            :rej_value_same_cond_nojob,
            :rej_value_same_commute_nojob,
            :rej_value],
            [:jobseeker_id];
            variable_name = :type,
            value_name = :val)
        dropmissing
        @subset :val .> quantile(:val, .05) .&& :val .< quantile(:val, .95)
        @aside d = Dict(
            "nojob_value" => "No job value",
            "rej_value" => "Rejected job",
            "rej_value_same_earn" => "Rejected value: earnings same",
            "rej_value_same_cond" => "Rejected value: conditions same",
            "rej_value_same_commute" => "Rejected value: commute same"
            )
        @rtransform :type = d[:type]
    end

It would be cool to have the labels be given as a Pair during the stack phase

stack(d, [:a_1 => "A 1", :a_2 => "A 2"])

to produce

4×2 DataFrame
 Row │ variable  value 
     │ String    Int64 
─────┼─────────────────
   1 │ A 1           1
   2 │ A 1           2
   3 │ A 2           3
   4 │ A 2           4

pdeffebach avatar Jan 26 '24 16:01 pdeffebach

I don't know dataframes meta but that looks quite complicated; what about just:

vars = [:a_1 => "A 1", :a_2 => "A 2"]
stack(rename(d, vars), last.(vars))

ericphanson avatar Apr 29 '24 11:04 ericphanson

That's harder to write in interactive settings. If you are at the REPL, you would have to know in advance what variables you are stacking on before you start typing the stack command.

pdeffebach avatar Apr 29 '24 17:04 pdeffebach