glum icon indicating copy to clipboard operation
glum copied to clipboard

User facing API for specifying linear models terms

Open lorentzenchr opened this issue 1 year ago • 1 comments

I've seen that glum version 3 will get a formula interface, much like R glm, using formulaic. This is a great step for more usability.

I wanted to ask for the appetite of yet another way to specify models based on the following requirements:

  • Highlevel interface much like Wilkinson formulae No scikit-learn pipeline needed.
  • (Some) Autocompletion support / Programmatic approach (formulaic uses a string, so no autocomplete)
  • Context free (formulaic saves the current scope / context)
  • Specify penalties It would be nice to be able to specify penalties per term, e.g. L2-difference for a B-spline, L2 for a categorical feature, or a group L2 or group L1 for another categorical feature. Sophisticated: geo-penalty

lorentzenchr avatar Nov 07 '23 11:11 lorentzenchr

Thanks! I am also excited for the formulaic-based formula interface to be released in v3 as a tool for fast exploratory model building.

In my opinion, there is still a lot of room for development within the formulaic-based framework. One can add stateful transforms and modify the tabmat materializer and there is also the possibility to add features to formulaic itself. Therefore, I would first try out and optimize the formulaic based framework for some time and later assess if a third way of specifying models is warranted.

As to your points:

Context free

The context can already be turned off by passing an empty dict. We could make this more explicit, e.g., allowing to set context=False, at the cost of moving away from formulaic's conventions.

Specify penalties

I think that this could be quite interesting. A related feature is [smoothness penalties for splines ] (https://github.com/Quantco/glum/issues/471#issuecomment-1821542714). Again, this could be incorporated within the formulaic-based framework. If one wanted to, e.g., be able to specify a penalized spline as something like bs(x, df=4, degree=3, cyclic_penalty=10), then one could write a stateful transform for that penalized spline and adjust the TabmatMaterializer to return a penalty matrix that corresponds to the desired penalty.

Autocompletion support

I agree that this would probably require a different approach.

I would be curious to know though if you have a specific formula library in mind or if you are suggesting developing one from the ground up.

Coming as part of Glum 3.

lbittarello avatar Apr 03 '24 14:04 lbittarello