glum icon indicating copy to clipboard operation
glum copied to clipboard

Change base levels for categorical fields?

Open enriquecardenas24 opened this issue 6 months ago • 3 comments

When modeling with glum using a dataset containing both categorical and numeric features, I want to manually set base levels for the categorical fields. This can be done in statsmodels models with the "formula" input. An example can be seen in a previous issue I opened, #777. In this issue, the base levels in the statsmodels model formula were set to 1.0 by jtilly in order to align the coefficients of the model to a glum model.

# References are base levels for categorical features.
formula = "Response~C(Year, Treatment(reference=1.0))"
formula += "+C(Field16952, Treatment(reference=1.0))"
formula += "+Field16995+Field17024+Field17041"  # all numeric here
formula += "+Field17045"
sm_fam = sm.families.Binomial()
sm_model = smf.glm(formula, train_data, family = sm_fam).fit()

Originally posted by @jtilly in https://github.com/Quantco/glum/issues/777#issuecomment-1979470033

Here, Year and Field16952 are categorical features with base level references.

Is there a way to modify the base levels of categorical features for a glum model?

enriquecardenas24 avatar Aug 15 '24 19:08 enriquecardenas24