pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Schema with MultiIndex's subindex dtype declared as something other than 'object' fails to validate an empty dataframe

Open davidandreoletti opened this issue 3 years ago • 1 comments

Describe the bug On an empty dataframe, with an empty multi index (ie empty sub indexes), validating the dataframe using the schema (declaring each subindex with non 'object' dtype) silently converts subindexes dtypes to 'object' types. Thereby failing the schema validation when the indexes in the schema declare for example 'Int64' dtype for each subindex.

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [x] (optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

class TestSchema(pandera.SchemaModel)
       level0: pandera.typing.Index[pd.Int64Dtype] = pandera.Field(coerce=True)
       level1: pandera.typing.Index[pd.Int64Dtype] = pandera.Field(coerce=True)


data = pd.DataFrame(index=pd.MultiIndex.from_arrays([[]] * 2))
schema = TestSchema.to_schema()
schema.validate(data, {'lazy': false, 'inplace':True})
# Throws SchemaError: expected series '0' to have type Int64, got object

Expected behavior

An empty dataframe's multiindex whose schema indicate a specific dtype (eg: Int64) must be converted to said dtype and pass the schema validation.

Desktop (please complete the following information):

  • OS: macOS 12.4
  • Browser: NA
  • Version: pandera v0.12

Screenshots

None

Additional context

None

davidandreoletti avatar Sep 07 '22 08:09 davidandreoletti

@cosmicBboy PR provided. Let me know when you want to discuss this.

davidandreoletti avatar Sep 07 '22 08:09 davidandreoletti