glum
glum copied to clipboard
Change base levels for categorical fields?
When modeling with glum
using a dataset containing both categorical and numeric features, I want to manually set base levels for the categorical fields. This can be done in statsmodels
models with the "formula" input. An example can be seen in a previous issue I opened, #777. In this issue, the base levels in the statsmodels
model formula were set to 1.0 by jtilly in order to align the coefficients of the model to a glum
model.
# References are base levels for categorical features.
formula = "Response~C(Year, Treatment(reference=1.0))"
formula += "+C(Field16952, Treatment(reference=1.0))"
formula += "+Field16995+Field17024+Field17041" # all numeric here
formula += "+Field17045"
sm_fam = sm.families.Binomial()
sm_model = smf.glm(formula, train_data, family = sm_fam).fit()
Originally posted by @jtilly in https://github.com/Quantco/glum/issues/777#issuecomment-1979470033
Here, Year
and Field16952
are categorical features with base level references.
Is there a way to modify the base levels of categorical features for a glum
model?