evidently icon indicating copy to clipboard operation
evidently copied to clipboard

Tabular Data Drift By Binary Classifier

Open Miriam2040 opened this issue 1 year ago • 2 comments

Hi,

I saw that for text & embedding there is binary classification method: "Evidently trains a binary classification model to discriminate between data from reference and current distribution"

I want to use same method for tabular data but didn't see it. Is it supported for tabular? If not how can I implement one?

and general question about data drift, can it be run as test in test suite or just report?

Thanks!

Miriam2040 avatar Aug 02 '23 06:08 Miriam2040

Hi @Miriam2040,

Classifier drift detection method

The classifier method is available for embeddings (docs here https://docs.evidentlyai.com/user-guide/customization/embeddings-drift-parameters) and for text data (docs here https://docs.evidentlyai.com/user-guide/customization/options-for-statistical-tests). For text data, it includes text-specific pre-processing.

It is not implemented for tabular data.

If you want to pass a custom drift detection method, here is the explanation of how to pass a custom function: https://docs.evidentlyai.com/user-guide/customization/add-custom-drift-method

Data drift test

Yes, you can use data drift detection as a Test Suite. There is a test preset (DataDriftTestPreset()), and separate tests you can choose from: TestNumberOfDriftedColumns(), TestShareOfDriftedColumns() for the dataset and TestColumnDrift(column_name='name') for individual columns.

You can see them in the example notebooks linked here https://docs.evidentlyai.com/examples

Or in the all tests list https://docs.evidentlyai.com/reference/all-tests#data-drift

elenasamuylova avatar Aug 02 '23 10:08 elenasamuylova

Great, thanks!

Miriam2040 avatar Aug 03 '23 05:08 Miriam2040