pandera
pandera copied to clipboard
Parameterized field names
Question about pandera
Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.
Is there a way to parameterize the field names? For example, If I'm making a schema to check if data is a panel dataset, I'd like the entity_id column name to be parameterize-able
Here's an example of the dataframe model
# Your code here, if applicable
class PanelSchema(pa.DataFrameModel):
entity_col: Series[str] = pa.Field(coerce=True, nullable=False)
date: DateTime = pa.Field(coerce=True, nullable=False)
class Config:
unique = ["entity_col", "date"]
strict = False
metadata: dict = {}
and here's how I'd like to use it, although open to other patterns that accomplish the same thing.
PanelSchema(entity_col = 'customer_id').validate(data)
As an added question, would Pandera be open to a contrib module? I think an inheritable PanelSchema would be helpful for alot of use cases. For example, multivariate time series, discrete time survival analysis, and cohort datasets can all be framed as panel datasets