pandera
pandera copied to clipboard
Float Nullability if Entire Column None
Describe the bug It seems that if all the values for a dataframe column are None, nullability throws an error. Not sure if theres a reason for this, or I am doing something wrong.
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandera.
- [x] (optional) I have confirmed this bug exists on the main branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
import pandera as pa
df = pd.DataFrame({"test": [None, 1.1, None]})
df1=pd.DataFrame({"test": [None, None, None]})
schema = pa.DataFrameSchema(
{"test": pa.Column(float, nullable=True)}
)
schema.validate(df)
print(df)
schema.validate(df1)
print(df1)
Expected behavior
I would expect the check to pass.
Desktop (please complete the following information):
- OS: [e.g. iOS] macOS 14.0 (23A344)
- Version: 0.19.3
setting dtype=None in pa.Column seems to circumvent this, or explicitly setting the dtype in the dataframe prior to running validation.
the problem is that
pd.DataFrame({"test": [None, None, None]})
is interpreted by pandas as an object column. In this case, pandera is behaving as expected, since the test column is of object dtype. If you want pandera to coerce to dtype, use the coerce argument:
schema = pa.DataFrameSchema(
{"test": pa.Column(float, nullable=True, coerce=True)}
)