spreg icon indicating copy to clipboard operation
spreg copied to clipboard

enh: specify models using patsy formulas

Open knaaptime opened this issue 5 years ago • 9 comments

I think it would be a major boon to usability to allow users to specify models using patsy formulas with geodataframes.

It looks like integrating it would be pretty straightforward, and I'm happy to get it started if it would be a welcome addition, so wanted to raise for discussion:

  1. are folks onboard with this idea?
  2. if so, do we want two APIs (like statsmodels) or a single one that could accept either signature?

knaaptime avatar Feb 14 '20 16:02 knaaptime

this is something we talked about a couple of years ago, but i don't think patsy was quite mature enough then. ljwolf may have worked on this. i'm all for it, also to make sure the new panel stuff we are working on conforms. probably need a hangout to agree on API.

lanselin avatar Feb 14 '20 17:02 lanselin

awesome. I'll start some experiments as a proof of concept, then maybe we can circle back on some API choices

knaaptime avatar Feb 14 '20 17:02 knaaptime

This is awesome, @knaaptime. As Luc said, we talked about this at some point, and also about changing the way we show the results.

I tend to always favor a single API that takes both structures. This is how I am currently working for panels: the code can take data in either long or wide formats. It’s just a matter of a try/except statement to figure out what to do. So my initial feeling is that a “don’t ask, just do” approach is better. But it would be great to further discuss this over a hangout!

pedrovma avatar Feb 14 '20 17:02 pedrovma

cool, that's what I'll start first. After some poking around last night, I'm pretty sure we're in exactly the situation described in the patsy docs image

which would basically mean just adding a data argument to the existing functions that patsy would use to generate the design matrices internally (otherwise, we just fall back to the existing API with data=None). Everything else is already set up nicely (e.g. we could use the patsy metadata to pass names to the name_xlist argument)

maybe we could chat a bit about this at the next dev meeting?

knaaptime avatar Feb 14 '20 17:02 knaaptime

Yeah, I recall writing a prototype back in 2016, and we couldn't agree on whether it should be a separate constructor (like... ML_Lag.from_formula()) or a separate module with different inits (like spreg.formula.ML_Lag), and how to deal with the instrumental variable formulas.

I like .from_formula(), and I think having an instrumental variables equation like nlm in R makes sense, more than extending the patsy grammar.

ljwolf avatar Feb 15 '20 13:02 ljwolf

Just to make the plug here too, I've added a comment over at mgwr#77 that affects this discussion too.

darribas avatar Feb 17 '20 12:02 darribas

not only is there a potential issue with spatial lags, there are also regime variables. how would those fit into the patsy syntax? same with spatially lagged explanatory variables (SLX, spatial Durbin), ideally computed on the fly (but not in the current implementation). and where would the weights be specified?

lanselin avatar Feb 17 '20 14:02 lanselin

Just to make the plug here too, I've added a comment over at mgwr#77 that affects this discussion too.

To @darribas point, if the patsy approach would be used throughout multiple packages (spreg, spvcm, spint, tobler, ....) then maybe we should think about putting it in libpysal?

sjsrey avatar Feb 17 '20 15:02 sjsrey

To @darribas point, if the patsy approach would be used throughout multiple packages (spreg, spvcm, spint, tobler, ....) then maybe we should think about putting it in libpysal?

I was thinking that, or even in pysal? It's not something "core" as weights would be, but something build "atop" the federation. Just to throw an idea out.

darribas avatar Feb 17 '20 17:02 darribas