patsy
patsy copied to clipboard
dmatrices raising “AssertionError”
Totally inexperienced user. My first Negative Binomial Regression. iPython on Google's Colab. I load the dataset as a pandas df. The features (and Target) in the formula below all appear in the df (which I named "dataset").
I also bring in
from patsy import dmatrices
import statsmodels.api as sm
however, when I
formula = """Target ~ MeanAge + %White + %HHsNotWater + HHsIneq*10 + %NotSaLang + %male + %Informal + COGTACatG2B09 + %Poor + AGRating """
data = dataset
response, predictors = dmatrices(formula, data, return_type='dataframe')
nb_results = sm.GLM(response, predictors, family=sm.families.NegativeBinomial(alpha=0.15)).fit()
print(nb_results.summary())
I simply get "AssertionError: ", and an arrow to line four (the one starting "response"). I have no idea how to remedy this, and cannot figure out if why this happens. Is it a Patsy issue? a Colab issue? A daft coding issue? Any sage guidance, please?
Sounds like your environment is messed up somehow – python errors should always have more details than that!
I don't know what's causing it, but some issues jump out at me in your formula:
-
Your variable names aren't valid python identifiers. Patsy allows arbitrary python code in the formula, so it's trying to interpret those
%
s as the python modulo operator. You need to useQ('%White')
to quote the variable name -
Patsy gives special meaning to
*
(for interactions), so it's not going to interpretHHSInEq*10
the way you want. To make sure you get the python version of*
, you can wrap the expression inI(HHSInEq*10)
. It's like the opposite of quoting it :-)
@njsmith - that was it! After fixing the formula, all fell into place. Many thanks.
I have a similar issue. When doing this I get a blank assertion error as well
`outcome_1, predictors_1 = patsy.dmatrices("Q('%received 18+') ~ n_killed_capita_2016", aggregate_2016_df)
mod_1 = sm.OLS(outcome_1, predictors_1)
res_1 = mod_1.fit()
print(res_1.summary())`