pandera
pandera copied to clipboard
DataFrameSchema.from_yaml ignores "ordered"
We are using DataFrameSchema.from_yaml and our schema YAML file includes the line ordered: true. However, we found that the column order was not being validated as expected. By examining the string representation of the DataFrameSchema we observed that ordered=False - which is not expected.
It looks to us like the cause may be at https://github.com/unionai-oss/pandera/blob/master/pandera/io.py#L262
return DataFrameSchema(
columns=columns,
checks=checks,
index=index,
coerce=serialized_schema.get("coerce", False),
strict=serialized_schema.get("strict", False),
unique=serialized_schema.get("unique", None),
This does not include ordered and therefore the ordered: true in our YAML file is being ignored and the default of ordered=False is used.
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandera.
- I have confirmed this bug exists on the master branch of pandera.
hi @zak1632, thanks for reporting this issue!
Yeah, basically we need to add the ordered key in _serialize_schema:
- https://github.com/unionai-oss/pandera/blob/master/pandera/io.py#L156
And then add an ordered=serialized_schema.get("ordered", False) in _deserialize_schema`:
- https://github.com/unionai-oss/pandera/blob/master/pandera/io.py#L268
And finally update the test_io tests to catch this case.
Would you be open to making a PR for this?
#943 will close this issue