pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Float Nullability if Entire Column None

Open Ducky6944 opened this issue 1 year ago • 1 comments

Describe the bug It seems that if all the values for a dataframe column are None, nullability throws an error. Not sure if theres a reason for this, or I am doing something wrong.

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [x] (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd 
import pandera as pa

df = pd.DataFrame({"test": [None, 1.1, None]})
df1=pd.DataFrame({"test": [None, None, None]})
schema = pa.DataFrameSchema(
    {"test": pa.Column(float, nullable=True)}
)
schema.validate(df)
print(df)
schema.validate(df1)
print(df1)

Expected behavior

I would expect the check to pass.

Desktop (please complete the following information):

  • OS: [e.g. iOS] macOS 14.0 (23A344)
  • Version: 0.19.3

Ducky6944 avatar Jun 19 '24 21:06 Ducky6944

setting dtype=None in pa.Column seems to circumvent this, or explicitly setting the dtype in the dataframe prior to running validation.

Ducky6944 avatar Jun 19 '24 22:06 Ducky6944

the problem is that

pd.DataFrame({"test": [None, None, None]})

is interpreted by pandas as an object column. In this case, pandera is behaving as expected, since the test column is of object dtype. If you want pandera to coerce to dtype, use the coerce argument:

schema = pa.DataFrameSchema(
    {"test": pa.Column(float, nullable=True, coerce=True)}
)

cosmicBboy avatar Jul 04 '24 20:07 cosmicBboy