glum icon indicating copy to clipboard operation
glum copied to clipboard

Get p-value of a categorical model field?

Open enriquecardenas24 opened this issue 1 year ago • 1 comments

I have an issue regarding obtaining a p-value of a single categorical field in my model. I can see that obtaining p-values for numeric fields is simple using a model.coef_table()['p_value'] call, but getting a single p-value for one categorical field may be more complicated due to the splitting of levels in the field.

image

Here, I have a couple of categorical fields ("Year," "Field16952") and a numeric ("Field16995"). As seen, there are multiple p-values for each level of each categorical field, which makes the p-value of the entire field unclear.

To get the p-values of categoricals, this source indicates that f-statistic would be a reliable method, a method that I already have an implementation of. However, my question is: Is the f-statistic the only way I can obtain the p-value of the categorical field? Or is there some easier method in glum I am overlooking?

enriquecardenas24 avatar Feb 22 '24 16:02 enriquecardenas24

Glum has a built-in method for performing a Wald test, which (among others) can be used to test for the joint significance of a number of variables (e.g. the levels of a categorical). In that case, it is asymptotically equivalent to an F test.

For example, if you'd like to get a p-value for the categorical variable Year, you could do something like

model.wald_test(features=["Year__2", "Year__3", "Year__4", "Year__5", "Year__6", "Year__7"])

In the upcoming glum v3 release, this will become even simpler:

model.wald_test(terms="Year")

stanmart avatar Feb 22 '24 21:02 stanmart