pandera
pandera copied to clipboard
Custom checks simply returning the wrong checks!
Describe the bug A clear and concise description of what the bug is.
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandera.
- [ ] (optional) I have confirmed this bug exists on the main branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
I have registered two custom checks date_in_current_cycle and date_conforms_to_format with the pandera_register_custom_check method. When I look at the checks on the columns generated by Field, however, it will simply return the wrong check!
Code Sample, a copy-pastable example
# Your code here
class _MyTestDataSchema(DataFrameModel):
numeric_col: float = Field(nullable=False, le=10)
date_col: date = Field(
nullable=False,
date_in_current_cycle={"date_format": "%Y-%m-%d"}
)
datetime_col: datetime = Field(
nullable=False,
date_in_current_cycle={"date_format": "%Y-%m-%d %H:%M:%S"}
)
print("\n")
for _, column in _MyTestDataSchema.to_schema().columns.items():
print(column, column.checks)
Will print
<Schema Column(name=numeric_col, type=DataType(float64))> [<Check less_than_or_equal_to: less_than_or_equal_to(10)>]
<Schema Column(name=date_col, type=DataType(date))> [<Check date_conforms_to_format>]
<Schema Column(name=datetime_col, type=DataType(datetime64[ns]))> [<Check date_conforms_to_format>]
Expected behavior
I would expect it to return the check itself when calling column.checks instead of a totally different one, that I do not even reference in the class. When I run dir(Check) on the global check object, I can see that both checks are registered:
['__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slotnames__', '__str__', '__subclasshook__', '__weakref__', '_get_check_fn_code', 'between', 'coherent_ghg_scope', 'coherent_locations', 'date_conforms_to_format', 'date_in_current_cycle', 'eq', 'equal_to', 'equal_to', 'error_if_duplicates', 'from_builtin_check_name', 'ge', 'get_backend', 'get_builtin_check_fn', 'greater_than', 'greater_than', 'greater_than_or_equal_to', 'greater_than_or_equal_to', 'gt', 'in_range', 'in_range', 'is_builtin_check', 'is_taxonomy', 'is_valid_country', 'is_valid_global_region', 'isin', 'isin', 'le', 'less_than', 'less_than', 'less_than_or_equal_to', 'less_than_or_equal_to', 'lt', 'ne', 'not_equal_to', 'not_equal_to', 'notin', 'notin', 'one_sample_ttest', 'register_backend', 'register_builtin_check_fn', 'str_contains', 'str_contains', 'str_endswith', 'str_endswith', 'str_length', 'str_length', 'str_matches', 'str_matches', 'str_startswith', 'str_startswith', 'two_sample_ttest', 'unique_values_eq', 'unique_values_eq', 'warn_and_remove_negative_values', 'warn_and_remove_zero_values']
Desktop (please complete the following information):
- OS: MacOS
- Pandera Version: 0.24.0
Screenshots
can you share the check registration code?
Hi @cosmicBboy, thanks, my check registration code is as follows: I have an Enum called TransformativeChecks which looks like
class TransformativeChecks(StrEnum):
DATE_IN_CURRENT_CYCLE = "date_in_current_cycle"
DATE_CONFORMS_TO_FORMAT = "date_conforms_to_format"
which maps to another Enum called PANDERA_CHECK_IMPLEMENTATION_MAP which maps the check names on TransformativeChecks to functions which are defined in the namespace, that define the implementation of each check.
Then, I add the functions to the namespace, running into the same issue with either locals or globals (can confirm there is no naming conflict in my codebase) :
for check_enum in TransformativeChecks:
if hasattr(Check, check_enum.value):
continue
# Register the custom check
implementation_fn = PANDERA_CHECK_IMPLEMENTATION_MAP[check_enum]
locals()[check_enum.value] = register_check_method(
create_wrapper_fn(implementation_fn, check_enum.value)
)
When I debug inside the loop for check_enum in TransformativeChecks I can see that both the referenced functions and names are as expected.
Ah I've found out what was going on! Inside the function create_wrapper_fn, I assigned the wrapper function to a deepcopy of an identity transformation, and then modified the annotations and name to what I thought was a new function:
from copy import deepcopy
type AnonFn[T] = Callable[..., T]
def create_wrapper_fn(impl_fn: AnonFn[Any], name: str) => AnonFn[pd.Series]:
wrapper_fn = deepcopy(identity_transformation)
wrapper_fn.__annotations__ = impl_fn.__annotations__
wrapper_fn.__name__ = name
return wrapper_fn
However each wrapper_fn in the local scope still referenced the same mutable object, despite doing deepcopy.
So I fixed it by evaluating the function explicitly
def wrapper_fn(series: pd.Series, **kwargs: Any) -> pd.Series:
return identity_transformation(series, **kwargs)
Thanks for you help!