Custom checks lost after to_yaml
Describe the bug A clear and concise description of what the bug is.
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandera.
- [x] (optional) I have confirmed this bug exists on the master branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# Your code here
import pandera as pa
def low_lt_high(df):
return df['low'] <= df['high']
schema = pa.DataFrameSchema(
columns={"close": pa.Column(float, checks=[pa.Check.gt(0.0), ])},
checks=[pa.Check(low_lt_high)]
)
print(schema.to_yaml())
Expected behavior
Keep checks rules in yaml so they can be loaded again.
Desktop (please complete the following information):
- OS: MacOS
- Version 0.12.0
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
Any plans for this? I'm also running into this problem, I have modular schemas that inherit from eachother and want to add another easy option for 3rd party users to get the yaml info from a schema and be able to see everything in 1 place - its a massive convenience to help adoption
Here's a more simple example without inheritance
import pandera as pa
import pandera.extensions as extensions
@extensions.register_check_method(statistics=["cls"])
def non_null_values_in_extra_columns(df, cls):
"""This function checks any column not specified in the schema and makes sure that its not null."""
# Get the columns defined in the schema
defined_columns = cls.to_schema().columns.keys()
# Find columns in the DataFrame that are not defined in the schema
extra_columns = [col for col in df.columns if col not in defined_columns]
# Check that all values in these extra columns are not null
return df[extra_columns].notnull().all().all()
class TestSchema(pa.DataFrameModel):
@pa.dataframe_check
def check_non_null_values_in_extra_columns(cls, df):
res = pa.Check.non_null_values_in_extra_columns(cls)(df)
return res.check_passed
print(TestSchema.to_yaml())
the expected behavior would be for the registered method non_null_values_in_extra_columns to show up in the yaml output. If I include the check in the Config it will show up in the yaml output, but there would be no way to reference cls that way, and with inheritance I would have to restate all of the Config settings that were inherited or else they would get overwritten (as far as I can tell, theres no way to append to a config apart from the metadata)
@cosmicBboy hope you don't mind me tagging you, but was wondering if you had any feedback or thoughts on this issue?