zenml
zenml copied to clipboard
[FEATURE]: Add `ignored_columns` to Evidently StandardStep to filter dataframe columns
Contact Details [Optional]
Describe the feature you'd like
ZenML currently implements a way to detect drift using Evidently. We created a standard step that can be used to generate and access Evidently's core visualisations here: zenml/src/zenml/integrations/evidently/steps/evidently_profile.py. (Learn more about Evidently and drift detection by checking out our example here.
The entrypoint function for the EvidentlyProfileStep currently takes a reference dataset and a comparison dataset. We would like to add an extra optional argument where users could pass in ignored_columns (a List of strings, most likely) which would then be ignored when making the drift comparison.
Is your feature request related to a problem?
No response
How do you solve your current problem with the current status-quo of ZenML?
Currently you would have to do the pre-processing in a separate step or process (i.e. removing the specific columns).
Any other comments?
This is related to #ENG-328 (Jira).
Hey @strickvl
While passing reference dataset and a comparison dataset, if you keep ref columns equal to comparison columns then there is no point of adding ignored_columns . User has to manage this scenario. I can't see any specific use case for it. Please let me know more about your thoughts. I would like to contribute.
Thanks,
@ketangangal What about the case when you want to calculate drift on not all columns but only certain ones? For example if you have a mixed dataset and only want to focus on the numerical ones? I think there is a use-case for it
@htahir1 working on it Now.