pandera icon indicating copy to clipboard operation
pandera copied to clipboard

DataFrameSchema.from_yaml ignores "ordered"

Open zak1632 opened this issue 3 years ago • 2 comments

We are using DataFrameSchema.from_yaml and our schema YAML file includes the line ordered: true. However, we found that the column order was not being validated as expected. By examining the string representation of the DataFrameSchema we observed that ordered=False - which is not expected.

It looks to us like the cause may be at https://github.com/unionai-oss/pandera/blob/master/pandera/io.py#L262

    return DataFrameSchema(
        columns=columns,
        checks=checks,
        index=index,
        coerce=serialized_schema.get("coerce", False),
        strict=serialized_schema.get("strict", False),
        unique=serialized_schema.get("unique", None),

This does not include ordered and therefore the ordered: true in our YAML file is being ignored and the default of ordered=False is used.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • I have confirmed this bug exists on the master branch of pandera.

zak1632 avatar Aug 11 '22 13:08 zak1632

hi @zak1632, thanks for reporting this issue!

Yeah, basically we need to add the ordered key in _serialize_schema:

  • https://github.com/unionai-oss/pandera/blob/master/pandera/io.py#L156

And then add an ordered=serialized_schema.get("ordered", False) in _deserialize_schema`:

  • https://github.com/unionai-oss/pandera/blob/master/pandera/io.py#L268

And finally update the test_io tests to catch this case.

Would you be open to making a PR for this?

cosmicBboy avatar Aug 11 '22 14:08 cosmicBboy

#943 will close this issue

dstumpy avatar Sep 14 '22 20:09 dstumpy