RegressionTables.jl
RegressionTables.jl copied to clipboard
`labels` fails when printing categorical levels?
Not sure I'm missing anything here, but it seems to me the replacement of column names with labels fails when the printed coefficients are labels of categoricals:
julia> using DataFrames, FixedEffectModels, RegressionTables
julia> df = DataFrame(mycol1 = rand(["a", "b", "c"], 10), mycol2 = rand(10), y = rand(10));
julia> labeldict = Dict("mycol1" => "Column 1", "mycol2" => "Column 2", "y" => "depvar");
julia> regtable(reg(df, @formula(y ~ mycol1 + mycol2)), labels = labeldict)
---------------------
depvar
-------
(1)
---------------------
(Intercept) 0.186
(0.175)
mycol1: b -0.117
(0.166)
mycol1: c -0.058
(0.186)
Column 2 0.568*
(0.227)
---------------------
Estimator OLS
---------------------
N 10
R2 0.543
---------------------
(jl_Avs96t) pkg> st
Status `C:\Users\ngudat\AppData\Local\Temp\jl_Avs96t\Project.toml`
[9d5cd8c9] FixedEffectModels v1.6.1
[d519eb52] RegressionTables v0.5.1
Thanks, that's quite possible. What's your labeldict
?
Sorry, not sure how that went missing - have edited above!
The key in the labels
Dict is really the string to be replaced, e.g. you can do
julia> labeldict = Dict("mycol1: b" => "Column 1", "mycol1: c" => "Blah", "mycol2" => "Column 2", "y" => "depvar");
julia> regtable(reg(df, @formula(y ~ mycol1 + mycol2)), labels = labeldict)
---------------------
depvar
-------
(1)
---------------------
(Intercept) 0.807**
(0.165)
Column 1 -0.167
(0.213)
Blah -0.113
(0.158)
Column 2 -0.054
(0.282)
---------------------
Estimator OLS
---------------------
N 10
R2 0.164
---------------------
I'm leaning towards keeping it like that, because it's more flexible than referring to the variable name directly. Let me know what you think.
I don't have time to try right now, but wouldn't transform_labels
be an option for that purpose?
I see, thanks for explaining and providing a workaround.
I still think it would make sense to do this automatically though - from an API perspective, currently specifying a label for a categorical variable in labels
just doesn't do anything, which is a bit surprising. Also, if one has quite a few levels, it becomes a bit tedious, as one would have to write something like ["col1: $i" for i in unique(df.col1)] .=> ["Column 1: $i for i in unique(df.col1)]
to achieve the effect that one might naively expect "col1" => "Column 1"
would have.
The problem is however that I'm not sure how this would be implemented if one would still want to allow users to replace the full string with something specific, e.g. if the user does "col1: b" => "something special", "col1" => "Column 1"
, the result would depend on the order of execution of the two replace
calls (which I guess isn't guaranteed if a Dict
is iterated?) It might be safe to assume that no user would actually try to specify both though?
I've tried now and both these work
regtable(reg(df, @formula(y ~ mycol1 + mycol2)),
labels = Dict("mycol2" => "Column 2", "y" => "depvar"),
transform_labels = Dict("mycol1" => "Column 1")
)
regtable(reg(df, @formula(y ~ mycol1 + mycol2)), transform_labels = labeldict)
From worker 2: ---------------------
From worker 2: depvar
From worker 2: -------
From worker 2: (1)
From worker 2: ---------------------
From worker 2: (Intercept) 0.258
From worker 2: (0.197)
From worker 2: Column 1: b 0.011
From worker 2: (0.222)
From worker 2: Column 1: c -0.137
From worker 2: (0.178)
From worker 2: Column 2 0.242
From worker 2: (0.298)
From worker 2: ---------------------
From worker 2: Estimator OLS
From worker 2: ---------------------
From worker 2: N 10
From worker 2: R2 0.154
From worker 2: ---------------------
transform_labels
replaces substrings in all labels. So it only works if mycol 1
doesn't show up anywhere else.
Okay, in that case I think it's probably fine as is, maybe with a note in the readme for labels
that transform_labels
should be used to change column names for categorical variables.