MD3 API modifications

Open tms-bananaquit opened this issue 3 years ago • 1 comments

MD3's API differs from the other detectors, because it is intended for a semi-supervised context. It may be desirable to address some or all of these:

Currently, MD3 has a set_reference to pass in a batch of data as the reference batch, which is inconsistent with the other streaming detectors. We could (1) leave this as is, (2) leave this as is and add set_reference to the other streaming detectors to make them consistent, (3) make MD3's API compatible with both an initial batch or stream-based data for setting the reference, or (4) change MD3 to be compatible only with stream-based data for setting the reference.
The waiting_for_oracle state and give_oracle_label method are unusual. You could imagine gathering the labeled samples as we go, rather than using an oracle function, and maintaining that as a buffer, with some scheme to "forget" "old-enough" labeled samples. This is not as-written in the paper, though.
The number of requested oracle samples and number of retraining samples are decoupled in our implementation, potentially. It may be worth making note of this in the docstring.

Jul 14 '22 18:07 tms-bananaquit

In fixing #96 and #95 , MD3 was left alone. It's still using DriftDetector instead of StreamingDetector, which has input validation. Supposing we did switch it to StreamingDetector, it still wouldn't be inheriting a set_reference method.

Aug 09 '22 18:08 tms-bananaquit