pandera icon indicating copy to clipboard operation
pandera copied to clipboard

WhyLogs integration: support stateful pandera schemas backed by whylogs profiles

Open cosmicBboy opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe.

Currently, pandera schemas are stateless: the schema only validates based on rules that are fully defined in code.

This is great, but it does close off many use cases that rely on data/aggregates of data that pass through a particular checkpoint in a user's data processing pipeline.

Describe the solution you'd like

With whylogs profiles, you can aggregate data in batch or streaming fashion into profiles, (e.g. the mean value of a column in a dataframe), and pandera can apply validation rules to both the actual data flowing through the pipeline and the data profile that whylogs produces, which could potentially span all of the data that's passed through a particular checkpoint.

cosmicBboy avatar Jun 21 '22 16:06 cosmicBboy