pandera
pandera copied to clipboard
Checks at the DataFrameSchema-level in case lambda return Series of Booleans or boolean with element_wise=True.
When I define a DataFrameSchema-level lambda which returns Series of Booleans it validates the df even if the the returned Series of Booleans not all True.
def parse_date(text):
for fmt in ('%Y-%m-%d', '%d.%m.%Y', '%d/%m/%Y'):
try:
return datetime.strptime(text, fmt)
except ValueError:
pass
return None
def validate_start_before_end(df):
result = []
counter = -1
for row in df['Start date']:
counter +=1
if not pd.isnull(df['End date'][counter]) and parse_date(df['Start date'][counter]) <= parse_date(df['End date'][counter]):
result.append(True)
else:
result.append(False)
return pd.Series(result)
schema = pa.DataFrameSchema(
columns={
'Start date': pa.Column(str,pa.Check(lambda s: UploadHeadCountListPerCompany.validate_dates(s),error="Wrong date format"), nullable=False),
'End date': pa.Column(str,pa.Check(lambda s: UploadHeadCountListPerCompany.validate_dates(s),error="Wrong date format"), coerce=True, nullable=True),
},
# define checks at the DataFrameSchema-level
checks=pa.Check( lambda df: UploadHeadCountListPerCompany.validate_start_before_end(df),error='Start Date should be before End Date',element_wise=False)
)
validated_df = self.schema.validate(reader) # <-- This line should raise Exception
![image](https://user-images.githubusercontent.com/44446418/158575678-7a972d7a-e0bd-44f1-bcbf-b6ed48ec4a28.png)
![image](https://user-images.githubusercontent.com/44446418/158575767-537bdc66-8957-4670-996e-118a6d1b460b.png)
![image](https://user-images.githubusercontent.com/44446418/158576022-2c2c5f98-a7c3-4547-a474-8ecc28328845.png)
I set the element_wise=True
and changed the function but it also not working
def validate_start_before_end(df):
if not pd.isnull(df['End date']) and UploadHeadCountListPerCompany.parse_date(df['Start date']) <= UploadHeadCountListPerCompany.parse_date(df['End date']):
return True
else:
return False
can you provide a minimally reproducible (copy-pasteable) example with toy data?