pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Custom checks simply returning the wrong checks!

Open JungeWerther opened this issue 5 months ago • 3 comments
trafficstars

Describe the bug A clear and concise description of what the bug is.

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [ ] (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

I have registered two custom checks date_in_current_cycle and date_conforms_to_format with the pandera_register_custom_check method. When I look at the checks on the columns generated by Field, however, it will simply return the wrong check!

Code Sample, a copy-pastable example

# Your code here
    class _MyTestDataSchema(DataFrameModel):
        numeric_col: float = Field(nullable=False, le=10)
        date_col: date = Field(
            nullable=False,
            date_in_current_cycle={"date_format": "%Y-%m-%d"}
        )
        datetime_col: datetime = Field(
            nullable=False,
            date_in_current_cycle={"date_format": "%Y-%m-%d %H:%M:%S"}
        )

    print("\n")
    for _, column in _MyTestDataSchema.to_schema().columns.items():
        print(column, column.checks)

Will print

<Schema Column(name=numeric_col, type=DataType(float64))> [<Check less_than_or_equal_to: less_than_or_equal_to(10)>]
<Schema Column(name=date_col, type=DataType(date))> [<Check date_conforms_to_format>]
<Schema Column(name=datetime_col, type=DataType(datetime64[ns]))> [<Check date_conforms_to_format>]

Expected behavior

I would expect it to return the check itself when calling column.checks instead of a totally different one, that I do not even reference in the class. When I run dir(Check) on the global check object, I can see that both checks are registered:

['__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slotnames__', '__str__', '__subclasshook__', '__weakref__', '_get_check_fn_code', 'between', 'coherent_ghg_scope', 'coherent_locations', 'date_conforms_to_format', 'date_in_current_cycle', 'eq', 'equal_to', 'equal_to', 'error_if_duplicates', 'from_builtin_check_name', 'ge', 'get_backend', 'get_builtin_check_fn', 'greater_than', 'greater_than', 'greater_than_or_equal_to', 'greater_than_or_equal_to', 'gt', 'in_range', 'in_range', 'is_builtin_check', 'is_taxonomy', 'is_valid_country', 'is_valid_global_region', 'isin', 'isin', 'le', 'less_than', 'less_than', 'less_than_or_equal_to', 'less_than_or_equal_to', 'lt', 'ne', 'not_equal_to', 'not_equal_to', 'notin', 'notin', 'one_sample_ttest', 'register_backend', 'register_builtin_check_fn', 'str_contains', 'str_contains', 'str_endswith', 'str_endswith', 'str_length', 'str_length', 'str_matches', 'str_matches', 'str_startswith', 'str_startswith', 'two_sample_ttest', 'unique_values_eq', 'unique_values_eq', 'warn_and_remove_negative_values', 'warn_and_remove_zero_values']

Desktop (please complete the following information):

  • OS: MacOS
  • Pandera Version: 0.24.0

Screenshots

Image

Image

JungeWerther avatar Jun 03 '25 18:06 JungeWerther

can you share the check registration code?

cosmicBboy avatar Jun 03 '25 19:06 cosmicBboy

Hi @cosmicBboy, thanks, my check registration code is as follows: I have an Enum called TransformativeChecks which looks like

class TransformativeChecks(StrEnum):
     DATE_IN_CURRENT_CYCLE = "date_in_current_cycle"
     DATE_CONFORMS_TO_FORMAT = "date_conforms_to_format" 

which maps to another Enum called PANDERA_CHECK_IMPLEMENTATION_MAP which maps the check names on TransformativeChecks to functions which are defined in the namespace, that define the implementation of each check.

Then, I add the functions to the namespace, running into the same issue with either locals or globals (can confirm there is no naming conflict in my codebase) :

    for check_enum in TransformativeChecks:
        if hasattr(Check, check_enum.value):
            continue
       
        # Register the custom check
        implementation_fn = PANDERA_CHECK_IMPLEMENTATION_MAP[check_enum]
        locals()[check_enum.value] = register_check_method(
            create_wrapper_fn(implementation_fn, check_enum.value)
        )

When I debug inside the loop for check_enum in TransformativeChecks I can see that both the referenced functions and names are as expected.

JungeWerther avatar Jun 04 '25 07:06 JungeWerther

Ah I've found out what was going on! Inside the function create_wrapper_fn, I assigned the wrapper function to a deepcopy of an identity transformation, and then modified the annotations and name to what I thought was a new function:

from copy import deepcopy

type AnonFn[T] = Callable[..., T]

def create_wrapper_fn(impl_fn: AnonFn[Any], name: str) => AnonFn[pd.Series]:
     wrapper_fn = deepcopy(identity_transformation)
     wrapper_fn.__annotations__ = impl_fn.__annotations__
     wrapper_fn.__name__ = name
     return wrapper_fn

However each wrapper_fn in the local scope still referenced the same mutable object, despite doing deepcopy.

So I fixed it by evaluating the function explicitly

def wrapper_fn(series: pd.Series, **kwargs: Any) -> pd.Series:
     return identity_transformation(series, **kwargs)

Thanks for you help!

JungeWerther avatar Jun 04 '25 07:06 JungeWerther