zenml icon indicating copy to clipboard operation
zenml copied to clipboard

[FEATURE]: Add `ignored_columns` to Evidently StandardStep to filter dataframe columns

Open strickvl opened this issue 3 years ago • 3 comments

Contact Details [Optional]

[email protected]

Describe the feature you'd like

ZenML currently implements a way to detect drift using Evidently. We created a standard step that can be used to generate and access Evidently's core visualisations here: zenml/src/zenml/integrations/evidently/steps/evidently_profile.py. (Learn more about Evidently and drift detection by checking out our example here.

The entrypoint function for the EvidentlyProfileStep currently takes a reference dataset and a comparison dataset. We would like to add an extra optional argument where users could pass in ignored_columns (a List of strings, most likely) which would then be ignored when making the drift comparison.

Is your feature request related to a problem?

No response

How do you solve your current problem with the current status-quo of ZenML?

Currently you would have to do the pre-processing in a separate step or process (i.e. removing the specific columns).

Any other comments?

This is related to #ENG-328 (Jira).

strickvl avatar May 17 '22 11:05 strickvl

Hey @strickvl

While passing reference dataset and a comparison dataset, if you keep ref columns equal to comparison columns then there is no point of adding ignored_columns . User has to manage this scenario. I can't see any specific use case for it. Please let me know more about your thoughts. I would like to contribute.

Thanks,

ketangangal avatar May 19 '22 07:05 ketangangal

@ketangangal What about the case when you want to calculate drift on not all columns but only certain ones? For example if you have a mixed dataset and only want to focus on the numerical ones? I think there is a use-case for it

htahir1 avatar May 21 '22 14:05 htahir1

@htahir1 working on it Now.

ketangangal avatar May 24 '22 09:05 ketangangal