menelaus
menelaus copied to clipboard
MD3 API modifications
MD3's API differs from the other detectors, because it is intended for a semi-supervised context. It may be desirable to address some or all of these:
- Currently, MD3 has a set_reference to pass in a batch of data as the reference batch, which is inconsistent with the other streaming detectors. We could (1) leave this as is, (2) leave this as is and add set_reference to the other streaming detectors to make them consistent, (3) make MD3's API compatible with both an initial batch or stream-based data for setting the reference, or (4) change MD3 to be compatible only with stream-based data for setting the reference.
- The
waiting_for_oraclestate andgive_oracle_labelmethod are unusual. You could imagine gathering the labeled samples as we go, rather than using an oracle function, and maintaining that as a buffer, with some scheme to "forget" "old-enough" labeled samples. This is not as-written in the paper, though. - The number of requested oracle samples and number of retraining samples are decoupled in our implementation, potentially. It may be worth making note of this in the docstring.
In fixing #96 and #95 , MD3 was left alone. It's still using DriftDetector instead of StreamingDetector, which has input validation. Supposing we did switch it to StreamingDetector, it still wouldn't be inheriting a set_reference method.