patsy icon indicating copy to clipboard operation
patsy copied to clipboard

Patsy prints uninformative error message when user places "Intercept" in a formula

Open setjmp opened this issue 12 years ago • 1 comments

This is observed on patsy 0.1.0.

I saw that the design_info object of a design matrix uses "Intercept" as the encoding for the intercept term so I wondered what would happen if a programmer chose this as the name for a feature.

The ideal scenerio is that patsy either: a. Does some name mangling b. throws an error telling me exactly what I did wrong if this is not going to be supported

What happens in reality is that an uniformative assertion message is produced:

Traceback (most recent call last): File "failure.py", line 5, in <module> y,X = patsy.dmatrices("sl ~ Intercept",dataFrame) File "build/bdist.macosx-10.8-intel/egg/patsy/highlevel.py", line 283, in dmatrices File "build/bdist.macosx-10.8-intel/egg/patsy/highlevel.py", line 150, in _do_highlevel_design File "build/bdist.macosx-10.8-intel/egg/patsy/build.py", line 860, in build_design_matrices File "build/bdist.macosx-10.8-intel/egg/patsy/build.py", line 776, in _build File "build/bdist.macosx-10.8-intel/egg/patsy/build.py", line 757, in design_info File "build/bdist.macosx-10.8-intel/egg/patsy/design_info.py", line 78, in __init__ AssertionError

Here is the code that produces the error:

import pandas
import patsy

dataFrame = pandas.io.parsers.read_csv("salary2.txt") 
y,X = patsy.dmatrices("sl ~ Intercept",dataFrame) 

setjmp avatar Feb 04 '13 04:02 setjmp

Oo, sneaky.

Yeah, we should probably do some name mangling, since the same thing could happen when people create custom factor objects. Maybe I can re-use the name mangling for the automatic name creation (i.e. just say that unnamed columns are called "x" and then let the name mangler turn that into "x1", "x2", ...).

njsmith avatar Feb 04 '13 05:02 njsmith