alibi-detect
alibi-detect copied to clipboard
test_ksdrift is flaky when all seed-setting code is removed
Introduction
The test test_ksdrift
in alibi_detect/cd/tests/test_ks.py
seems to be flaky when all seed-setting code (e.g. np.random.seed(0)
or tf.random.set_seed(0)
) is commented out.
For instance, in commit 1b06ecd37a08280d3bcff2b41b123a1f528afc0d (version 0.5.2), test_ksdrift[368]
to test_ksdrift[375]
fail ~4-12% of the time (out of 500 runs) when all seed-setting code is removed compared to 0% of the time (out of 500 runs) when no seed-setting code is removed.
Tests 368-375 test the "less" alternative hypothesis of the KS drift detector using UAE under: Bonferroni and FDR correction: correction = ['bonferroni', 'fdr']
; reservoir sampling and latest sampling: update_X_ref = [{'last': 1000}, {'reservoir_sampling': 1000}]
; and whether the preprocessing step is used: preprocess_X_ref = [True, False]
.
Motivation
Some tests can be flaky with high failure rates, but are not discovered when the seeds are set, such as in the case of the aforementioned test. We are trying to stabilize such tests.
Environment
The tests were run using pytest 6.2.2
in a conda
environment with Python 3.6.13
. The OS used was Ubuntu 16.04
.
Possible Solutions
One possible solution to reduce flakiness is to change the parameters used for prediction. We tried changing the following parameters.
Increasing n_infer
from 2 to 10 does not seem to reduce the failure rate.
Increasing update_X_ref
from 1000 to 3000 seems to reduce the failure rate to 2-5%.
Increasing update_X_ref
to 7500 also reduces the failure rate to 2-5%, though the distribution of failures is different as compared to changing the parameter to 3000.
Changing update_X_ref
does not change runtimes by much.
Please let me know if this solution is feasible or if there are any other solutions that should be incorporated. If you are interested, we can send the details of other tests demonstrating similar behavior. We will be happy to raise a Pull Request to fix the tests and incorporate any feedback that you may have.