evidently icon indicating copy to clipboard operation
evidently copied to clipboard

DaTa Drift | Dataset volume

Open nagasaipureti opened this issue 1 year ago • 1 comments

Hi Team, Good Day!

We are trying to implement Data Drift detection in our project. In the Latest documentation of evidently, we found that, for large datasets ,we have sample them before passing it to evidently. Please specify , an approximate size of dataset, at what threshold size we have to turn sampling before passing it to evidently.

nagasaipureti avatar Aug 07 '23 09:08 nagasaipureti

Hi @nagasaipureti,

I am afraid we cannot give a precise answer here.

The performance varies based on your infrastructure, the number of rows/columns, and the exact metrics used (e.g., some drift detection methods are faster than others).

Also, the need for sampling may vary based on whether you run reports ad hoc in your notebook (when waiting too long for a report to appear might be inconvenient) or run an automated pipeline (when it's more acceptable that the computation will take some time).

I'd suggest running a few tests on your sample datasets to develop your heuristics here.

elenasamuylova avatar Aug 08 '23 12:08 elenasamuylova