evidently
evidently copied to clipboard
Data drift volume issues. Any workaround ?
Hello,
I'm currently using Evidently to perform a data drift analysis on my dataset. The dataset has a total shape of (356,251 rows, 797 columns) for both reference and current data.
When I execute the Run() function in Evidently, it seems to run indefinitely. To give you an idea of the issue:
When I analyze 50 columns, it takes approximately 3 minutes. However, when I increase the number of columns to 100, the process takes about 23 minutes. I'm wondering if there's a workaround for this situation. One idea I had is to break down the analysis into smaller chunks, perhaps 50 columns at a time, and then merge the results into a single comprehensive report.
Additionally, having a progress bar feature would be extremely helpful to monitor the analysis's progress, especially in cases where it takes a significant amount of time.
Any guidance or suggestions would be greatly appreciated.
Thank you
True when try to do column level metric test, it takes lot of time if you have more columns . Try :
- feature selection
- PCA
@YassineR Datadrift test time depends a lot on
- which test you're using
- type of column (numerical or categorical)
- structure of data, and particularly number of unique values for cat. columns
So it's possible that your columns are just different. To check this you can run test separately for each column and measure the time.