RegressionTables.jl icon indicating copy to clipboard operation
RegressionTables.jl copied to clipboard

`labels` fails when printing categorical levels?

Open nilshg opened this issue 3 years ago • 7 comments

Not sure I'm missing anything here, but it seems to me the replacement of column names with labels fails when the printed coefficients are labels of categoricals:

julia> using DataFrames, FixedEffectModels, RegressionTables

julia> df = DataFrame(mycol1 = rand(["a", "b", "c"], 10), mycol2 = rand(10), y = rand(10));

julia> labeldict = Dict("mycol1" => "Column 1", "mycol2" => "Column 2", "y" => "depvar");

julia> regtable(reg(df, @formula(y ~ mycol1 + mycol2)), labels = labeldict)

---------------------
               depvar
              -------
                  (1)
---------------------
(Intercept)     0.186
              (0.175)
mycol1: b      -0.117
              (0.166)
mycol1: c      -0.058
              (0.186)
Column 2       0.568*
              (0.227)
---------------------
Estimator         OLS
---------------------
N                  10
R2              0.543
---------------------

(jl_Avs96t) pkg> st
      Status `C:\Users\ngudat\AppData\Local\Temp\jl_Avs96t\Project.toml`
  [9d5cd8c9] FixedEffectModels v1.6.1
  [d519eb52] RegressionTables v0.5.1

nilshg avatar Apr 22 '21 15:04 nilshg

Thanks, that's quite possible. What's your labeldict?

jmboehm avatar Apr 22 '21 15:04 jmboehm

Sorry, not sure how that went missing - have edited above!

nilshg avatar Apr 22 '21 15:04 nilshg

The key in the labels Dict is really the string to be replaced, e.g. you can do

julia> labeldict = Dict("mycol1: b" => "Column 1", "mycol1: c" => "Blah", "mycol2" => "Column 2", "y" => "depvar");

julia> regtable(reg(df, @formula(y ~ mycol1 + mycol2)), labels = labeldict)

---------------------
               depvar
              -------
                  (1)
---------------------
(Intercept)   0.807**
              (0.165)
Column 1       -0.167
              (0.213)
Blah           -0.113
              (0.158)
Column 2       -0.054
              (0.282)
---------------------
Estimator         OLS
---------------------
N                  10
R2              0.164
---------------------

I'm leaning towards keeping it like that, because it's more flexible than referring to the variable name directly. Let me know what you think.

jmboehm avatar Apr 22 '21 17:04 jmboehm

I don't have time to try right now, but wouldn't transform_labels be an option for that purpose?

greimel avatar Apr 22 '21 17:04 greimel

I see, thanks for explaining and providing a workaround.

I still think it would make sense to do this automatically though - from an API perspective, currently specifying a label for a categorical variable in labels just doesn't do anything, which is a bit surprising. Also, if one has quite a few levels, it becomes a bit tedious, as one would have to write something like ["col1: $i" for i in unique(df.col1)] .=> ["Column 1: $i for i in unique(df.col1)] to achieve the effect that one might naively expect "col1" => "Column 1" would have.

The problem is however that I'm not sure how this would be implemented if one would still want to allow users to replace the full string with something specific, e.g. if the user does "col1: b" => "something special", "col1" => "Column 1", the result would depend on the order of execution of the two replace calls (which I guess isn't guaranteed if a Dict is iterated?) It might be safe to assume that no user would actually try to specify both though?

nilshg avatar Apr 23 '21 07:04 nilshg

I've tried now and both these work

regtable(reg(df, @formula(y ~ mycol1 + mycol2)),
    labels = Dict("mycol2" => "Column 2", "y" => "depvar"),
    transform_labels = Dict("mycol1" => "Column 1")
)
regtable(reg(df, @formula(y ~ mycol1 + mycol2)), transform_labels = labeldict)
      From worker 2:	 ---------------------
      From worker 2:	               depvar
      From worker 2:	              -------
      From worker 2:	                  (1)
      From worker 2:	---------------------
      From worker 2:	(Intercept)     0.258
      From worker 2:	              (0.197)
      From worker 2:	Column 1: b     0.011
      From worker 2:	              (0.222)
      From worker 2:	Column 1: c    -0.137
      From worker 2:	              (0.178)
      From worker 2:	Column 2        0.242
      From worker 2:	              (0.298)
      From worker 2:	---------------------
      From worker 2:	Estimator         OLS
      From worker 2:	---------------------
      From worker 2:	N                  10
      From worker 2:	R2              0.154
      From worker 2:	---------------------

transform_labels replaces substrings in all labels. So it only works if mycol 1 doesn't show up anywhere else.

greimel avatar Apr 23 '21 08:04 greimel

Okay, in that case I think it's probably fine as is, maybe with a note in the readme for labels that transform_labels should be used to change column names for categorical variables.

nilshg avatar Apr 23 '21 08:04 nilshg