menelaus icon indicating copy to clipboard operation
menelaus copied to clipboard

MD3 API modifications

Open tms-bananaquit opened this issue 3 years ago • 1 comments

MD3's API differs from the other detectors, because it is intended for a semi-supervised context. It may be desirable to address some or all of these:

  • Currently, MD3 has a set_reference to pass in a batch of data as the reference batch, which is inconsistent with the other streaming detectors. We could (1) leave this as is, (2) leave this as is and add set_reference to the other streaming detectors to make them consistent, (3) make MD3's API compatible with both an initial batch or stream-based data for setting the reference, or (4) change MD3 to be compatible only with stream-based data for setting the reference.
  • The waiting_for_oracle state and give_oracle_label method are unusual. You could imagine gathering the labeled samples as we go, rather than using an oracle function, and maintaining that as a buffer, with some scheme to "forget" "old-enough" labeled samples. This is not as-written in the paper, though.
  • The number of requested oracle samples and number of retraining samples are decoupled in our implementation, potentially. It may be worth making note of this in the docstring.

tms-bananaquit avatar Jul 14 '22 18:07 tms-bananaquit

In fixing #96 and #95 , MD3 was left alone. It's still using DriftDetector instead of StreamingDetector, which has input validation. Supposing we did switch it to StreamingDetector, it still wouldn't be inheriting a set_reference method.

tms-bananaquit avatar Aug 09 '22 18:08 tms-bananaquit