glum
glum copied to clipboard
User facing API for specifying linear models terms
I've seen that glum version 3 will get a formula interface, much like R glm, using formulaic. This is a great step for more usability.
I wanted to ask for the appetite of yet another way to specify models based on the following requirements:
- Highlevel interface much like Wilkinson formulae No scikit-learn pipeline needed.
- (Some) Autocompletion support / Programmatic approach (formulaic uses a string, so no autocomplete)
- Context free (formulaic saves the current scope / context)
- Specify penalties It would be nice to be able to specify penalties per term, e.g. L2-difference for a B-spline, L2 for a categorical feature, or a group L2 or group L1 for another categorical feature. Sophisticated: geo-penalty
Thanks! I am also excited for the formulaic
-based formula interface to be released in v3 as a tool for fast exploratory model building.
In my opinion, there is still a lot of room for development within the formulaic
-based framework. One can add stateful transforms and modify the tabmat materializer and there is also the possibility to add features to formulaic itself. Therefore, I would first try out and optimize the formulaic
based framework for some time and later assess if a third way of specifying models is warranted.
As to your points:
Context free
The context can already be turned off by passing an empty dict. We could make this more explicit, e.g., allowing to set context=False
, at the cost of moving away from formulaic's conventions.
Specify penalties
I think that this could be quite interesting. A related feature is [smoothness penalties for splines ] (https://github.com/Quantco/glum/issues/471#issuecomment-1821542714). Again, this could be incorporated within the formulaic-based framework. If one wanted to, e.g., be able to specify a penalized spline as something like bs(x, df=4, degree=3, cyclic_penalty=10)
, then one could write a stateful transform for that penalized spline and adjust the TabmatMaterializer
to return a penalty matrix that corresponds to the desired penalty.
Autocompletion support
I agree that this would probably require a different approach.
I would be curious to know though if you have a specific formula library in mind or if you are suggesting developing one from the ground up.
Coming as part of Glum 3.