spreg
spreg copied to clipboard
enh: specify models using patsy formulas
I think it would be a major boon to usability to allow users to specify models using patsy formulas with geodataframes.
It looks like integrating it would be pretty straightforward, and I'm happy to get it started if it would be a welcome addition, so wanted to raise for discussion:
- are folks onboard with this idea?
- if so, do we want two APIs (like statsmodels) or a single one that could accept either signature?
this is something we talked about a couple of years ago, but i don't think patsy was quite mature enough then. ljwolf may have worked on this. i'm all for it, also to make sure the new panel stuff we are working on conforms. probably need a hangout to agree on API.
awesome. I'll start some experiments as a proof of concept, then maybe we can circle back on some API choices
This is awesome, @knaaptime. As Luc said, we talked about this at some point, and also about changing the way we show the results.
I tend to always favor a single API that takes both structures. This is how I am currently working for panels: the code can take data in either long or wide formats. It’s just a matter of a try/except statement to figure out what to do. So my initial feeling is that a “don’t ask, just do” approach is better. But it would be great to further discuss this over a hangout!
cool, that's what I'll start first. After some poking around last night, I'm pretty sure we're in exactly the situation described in the patsy docs
which would basically mean just adding a data
argument to the existing functions that patsy would use to generate the design matrices internally (otherwise, we just fall back to the existing API with data=None
). Everything else is already set up nicely (e.g. we could use the patsy metadata to pass names to the name_xlist
argument)
maybe we could chat a bit about this at the next dev meeting?
Yeah, I recall writing a prototype back in 2016, and we couldn't agree on whether it should be a separate constructor (like... ML_Lag.from_formula()) or a separate module with different inits (like spreg.formula.ML_Lag), and how to deal with the instrumental variable formulas.
I like .from_formula(), and I think having an instrumental variables equation like nlm in R makes sense, more than extending the patsy grammar.
Just to make the plug here too, I've added a comment over at mgwr#77
that affects this discussion too.
not only is there a potential issue with spatial lags, there are also regime variables. how would those fit into the patsy syntax? same with spatially lagged explanatory variables (SLX, spatial Durbin), ideally computed on the fly (but not in the current implementation). and where would the weights be specified?
Just to make the plug here too, I've added a comment over at
mgwr#77
that affects this discussion too.
To @darribas point, if the patsy approach would be used throughout multiple packages (spreg, spvcm, spint, tobler, ....) then maybe we should think about putting it in libpysal?
To @darribas point, if the patsy approach would be used throughout multiple packages (spreg, spvcm, spint, tobler, ....) then maybe we should think about putting it in libpysal?
I was thinking that, or even in pysal
? It's not something "core" as weights
would be, but something build "atop" the federation. Just to throw an idea out.