DataFrames.jl
DataFrames.jl copied to clipboard
Feature request: Pairs in stack
This is a small feature that might save me some typing (that is, feel free to ignore)
stack
is a nice function. But I'm finding that frequently, I want to stack
right before making a table or a plot. And variable names aren't very conducive to pretty plots and tables.
julia> d = DataFrame(a_1 = [1, 2], a_2 = [3, 4]);
julia> stack(d, [:a_1, :a_2])
4×2 DataFrame
Row │ variable value
│ String Int64
─────┼─────────────────
1 │ a_1 1
2 │ a_1 2
3 │ a_2 3
4 │ a_2 4
I end up having to do some sort of re-labeling after I stack
, something clunky like as follows
t = @chain df begin
@rsubset :has_job == 0
@rsubset !ismissing(:rej_value)
stack(
[:nojob_value,
:rej_value_same_earn_nojob,
:rej_value_same_cond_nojob,
:rej_value_same_commute_nojob,
:rej_value],
[:jobseeker_id];
variable_name = :type,
value_name = :val)
dropmissing
@subset :val .> quantile(:val, .05) .&& :val .< quantile(:val, .95)
@aside d = Dict(
"nojob_value" => "No job value",
"rej_value" => "Rejected job",
"rej_value_same_earn" => "Rejected value: earnings same",
"rej_value_same_cond" => "Rejected value: conditions same",
"rej_value_same_commute" => "Rejected value: commute same"
)
@rtransform :type = d[:type]
end
It would be cool to have the labels be given as a Pair
during the stack
phase
stack(d, [:a_1 => "A 1", :a_2 => "A 2"])
to produce
4×2 DataFrame
Row │ variable value
│ String Int64
─────┼─────────────────
1 │ A 1 1
2 │ A 1 2
3 │ A 2 3
4 │ A 2 4
I don't know dataframes meta but that looks quite complicated; what about just:
vars = [:a_1 => "A 1", :a_2 => "A 2"]
stack(rename(d, vars), last.(vars))
That's harder to write in interactive settings. If you are at the REPL, you would have to know in advance what variables you are stacking on before you start typing the stack
command.