pandera
pandera copied to clipboard
Add special case to check values of `str` column
Describe the bug
FYI @jeffzi @dineshkumar-23
The bug is clearly described here. Basically, since str
dtype arrays are translated to a numpy object arrays, any object can exist within such a column and still pass validation.
There is now pandas.StringDtype
since pandas > 1.0, but I think it's still important to special-case this type because (i) many users may not be aware of it and (ii) I think pandera should start getting into the business of correcting some of pandas' quirks, esp. when it comes to the type system.
The special-casing should be implemented at the DataType
definition (i.e. pandera.engines.numpy_engine.String
) after we have an API for logical data types https://github.com/pandera-dev/pandera/pull/798.
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of pandera.
- [X] (optional) I have confirmed this bug exists on the master branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
See https://github.com/pandera-dev/pandera/discussions/807
Expected behavior
Failure cases of non-string objects in a numpy object array (aka a string column) should be correctly reported.