pandera icon indicating copy to clipboard operation
pandera copied to clipboard

pa.SeriesSchema with a dtype do not verify the actual values within a pd.Series are that type

Open rjurney opened this issue 2 years ago • 1 comments

Describe the bug A clear and concise description of what the bug is.

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [ ] (optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

schema = pa.SeriesSchema(
     str,
     nullable=True,
     unique=False,
     name="note"
)

s = pd.Series([[], [], ["hi"]])
s.name = "note"

schema.validate(s)

Output:

In [124]: schema.validate(s)

Out[124]:
0      []
1      []
2    [hi]
Name: note, dtype: object

Expected behavior

I expect an error for all of these rows. They are not strings. They are lists and lists of strings. I am confused about how the library works if this doesn't work? The pandas.Series.dtype is object, which is usually the case, but it looks like the code just checks that type. Huh?

Desktop (please complete the following information):

  • OS: Mac OS X Monterey 12.4

rjurney avatar Sep 09 '22 03:09 rjurney

With version 0.17.2 I get a SchemaError (as expected:

SchemaError: expected series 'note' to have type str: failure cases: index failure_case 0 0 [] 1 1 [] 2 2 [hi]

andreas-wolf avatar Nov 13 '23 19:11 andreas-wolf