evidently
evidently copied to clipboard
DaTa Drift | Dataset volume
Hi Team, Good Day!
We are trying to implement Data Drift detection in our project. In the Latest documentation of evidently, we found that, for large datasets ,we have sample them before passing it to evidently. Please specify , an approximate size of dataset, at what threshold size we have to turn sampling before passing it to evidently.
Hi @nagasaipureti,
I am afraid we cannot give a precise answer here.
The performance varies based on your infrastructure, the number of rows/columns, and the exact metrics used (e.g., some drift detection methods are faster than others).
Also, the need for sampling may vary based on whether you run reports ad hoc in your notebook (when waiting too long for a report to appear might be inconvenient) or run an automated pipeline (when it's more acceptable that the computation will take some time).
I'd suggest running a few tests on your sample datasets to develop your heuristics here.