patsy
patsy copied to clipboard
Patsy prints uninformative error message when user places "Intercept" in a formula
This is observed on patsy 0.1.0.
I saw that the design_info object of a design matrix uses "Intercept" as the encoding for the intercept term so I wondered what would happen if a programmer chose this as the name for a feature.
The ideal scenerio is that patsy either: a. Does some name mangling b. throws an error telling me exactly what I did wrong if this is not going to be supported
What happens in reality is that an uniformative assertion message is produced:
Traceback (most recent call last): File "failure.py", line 5, in <module> y,X = patsy.dmatrices("sl ~ Intercept",dataFrame) File "build/bdist.macosx-10.8-intel/egg/patsy/highlevel.py", line 283, in dmatrices File "build/bdist.macosx-10.8-intel/egg/patsy/highlevel.py", line 150, in _do_highlevel_design File "build/bdist.macosx-10.8-intel/egg/patsy/build.py", line 860, in build_design_matrices File "build/bdist.macosx-10.8-intel/egg/patsy/build.py", line 776, in _build File "build/bdist.macosx-10.8-intel/egg/patsy/build.py", line 757, in design_info File "build/bdist.macosx-10.8-intel/egg/patsy/design_info.py", line 78, in __init__ AssertionError
Here is the code that produces the error:
import pandas
import patsy
dataFrame = pandas.io.parsers.read_csv("salary2.txt")
y,X = patsy.dmatrices("sl ~ Intercept",dataFrame)
Oo, sneaky.
Yeah, we should probably do some name mangling, since the same thing could happen when people create custom factor objects. Maybe I can re-use the name mangling for the automatic name creation (i.e. just say that unnamed columns are called "x" and then let the name mangler turn that into "x1", "x2", ...).