glum icon indicating copy to clipboard operation
glum copied to clipboard

GLM fit without penalty

Open lorentzenchr opened this issue 3 years ago • 4 comments

I would be very nice to also fit completely unpenalized GLMs (as in base R glm). For categorical features, one then needs to drop one level, called reference or base level.

from plotnine.data import diamonds
from glum import GeneralizedLinearRegressor


# Use the 4 c's as feature
# Note that cut, color, clarity have dtype category
X = diamonds.loc[:,["carat", "cut", "color", "clarity"]]

# Targets
y = diamonds["price"]

glm = GeneralizedLinearRegressor(alpha=0, family="gamma", link="log")
glm.fit(X, y)

gives

LinAlgError: Matrix is singular.

The error is correct as no categorical level was dropped.

lorentzenchr avatar Dec 20 '21 19:12 lorentzenchr

xref https://github.com/Quantco/tabmat/issues/75

jtilly avatar Dec 20 '21 19:12 jtilly

In applications where we need to drop a base level, we typically one-hot encode our categoricals before using glum. With few levels, that's also faster than using categorical types. As you can see in the tabmat issue, we have discussed this before but we were a bit unsure what a good API design for this. The easiest solution would be to drop the first category per variable. Do you think that would be good enough?

jtilly avatar Dec 20 '21 20:12 jtilly

AFAIK, R also drops the first level so this seems like a good default, for sure, only in case of alpha=0. If someone still needs more control, she/he could go for sklearn.preprocessing.OneHotEncoder(drop=...) or for a formula solution.

lorentzenchr avatar Dec 20 '21 22:12 lorentzenchr

Now tabmat has support to drop the first column of a CategoricalMatrix, are there any plans to leverage this to allow direct fitting of unpenalised GLMs (i.e. without creating the design matrix prior to input)?

peterlee18 avatar Mar 08 '22 08:03 peterlee18

Glum now offers a drop_first option (#571).

lbittarello avatar Mar 15 '23 08:03 lbittarello

That‘s great. Thank you for all your work!

lorentzenchr avatar Mar 17 '23 06:03 lorentzenchr