alibi-detect icon indicating copy to clipboard operation
alibi-detect copied to clipboard

Add KeOps MMD detector

Open arnaudvl opened this issue 2 years ago • 13 comments

Add MMD detector using the KeOps (PyTorch) backend to further accelerate drift detection and scale up to larger datasets. This PR needs to be made compatible with the optional dependency management (incl. #538 and related).

This PR includes:

  • [x] MMD detector implementation using KeOps
  • [x] GaussianRBF kernel using KeOps
  • [x] Tests
  • [x] Docs
  • [x] Basic benchmarking example vs. PyTorch MMD
  • [x] Add a note to docs regarding lack of Windows support.
  • [x] Investigate segfault with MacOS, or drop support for now.
  • [x] Document sigma_mean vs. sigma_median and make foolproof.
  • [x] Update keops infer_sigma check.
  • [x] Update docstrings keops kernels to clarify various dims options + clarify within the forward pass.
  • [x] Clarify GPU requirements and prettify example.
  • [x] Document logic keops kernels more explicitly.
  • [x] Fully compatible tests with torch and tensorflow.
  • [ ] Unit test _mmd2.
  • [x] Exception -> error type in keops test.
  • [ ] Test sigma_mean for both "usual" (non-batch) and batch setting (unusual and should probably use the first batch entry since it corresponds to the original (x, y)).

Once this PR is merged, it will be followed up by a similar implementation for the Learned (Deep) Kernel detector.

arnaudvl avatar Jul 06 '22 11:07 arnaudvl

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@arnaudvl I shall resolve these conflicts and then review once we have #537 merged.

I'm also adding @mauicv for review specifically to check my additions wrt to optional dependency handling (once I've added!)

ascillitoe avatar Jul 14 '22 09:07 ascillitoe

Additional note: Need to check suitable error is raised when passed to save_detector. Implementing save/load functionality can be left to a future PR.

ascillitoe avatar Jul 14 '22 09:07 ascillitoe

@arnaudvl I have merged in 0.10.0 associated changes from master into this PR. This primarily involves:

  • Updating the preprocessing kwarg's (preprocess_x_ref etc) in code, docs and tests.
  • Incorporating @mauicv's new optional deps functionality. keops support is now installed with pip install alibi-detect[keops]. @mauicv could you check this please, and also add some tox tests for the keops backend (pretty please 🙂).
  • Raising a NotImplemented error when the keops based MMDDrift is passed to save_detector.

I seem to have introduced a mypy error which I shall investigate now (this might also be occurring because numpy or mypy have been updated since the tests were last run...)

ascillitoe avatar Jul 26 '22 16:07 ascillitoe

@arnaudvl re the mypy error, I have simply added a type ignore:

https://github.com/SeldonIO/alibi-detect/blob/0db2239e53996e1c56b01758dfb1dc2f7eafa2a0/alibi_detect/cd/keops/mmd.py#L164-L171

This error has started arising in the same place in the pytorch version too (https://github.com/SeldonIO/alibi-detect/issues/540), and we are ignoring it in the same way. The issue is that mypy loses track of the types for x_ref and x due to the preceding # type: ignore[assignment]'s. IMO we should ignore for now and think about how we handle the type annotations for x_ref and x more generally in all our score and predict methods (as part of addressing #540).

ascillitoe avatar Jul 27 '22 10:07 ascillitoe

test_mmd_keops passes on ubuntu builds but fails on macOS. We need to explore further or just say we don't support keops with macOS for now, and skip the tests...

I am tempted to only officially support linux for now, since it would seem prudent to test keops on Windows and macOS more before supporting. Especially GPU's etc.

ascillitoe avatar Jul 27 '22 10:07 ascillitoe

@ascillitoe

Incorporating @mauicv's new optional deps functionality. keops support is now installed with pip install alibi-detect[keops]. @mauicv could you check this please, and also add some tox tests for the keops backend (pretty please slightly_smiling_face).

As talked about offline, because keops is a backend it's not included in the test_dep_managment tests however it makes sense to have a tox environment for testing regardless. I've also added torch to the keops dependency bucket as they're both required in the MMD detector with the keops backend. This has required making some changes to the other torch test_dep_management tests.

Wil do a review now as well

mauicv avatar Jul 27 '22 14:07 mauicv

As talked about offline, because keops is a backend it's not included in the test_dep_managment tests however it makes sense to have a tox environment for testing regardless. I've also added torch to the keops dependency bucket as they're both required in the MMD detector with the keops backend. This has required making some changes to the other torch test_dep_management tests.

Thanks @mauicv!

ascillitoe avatar Jul 27 '22 15:07 ascillitoe

In the cd_mmd_drift notebook we have:

The notebook requires PyTorch and KeOps to be installed. Once PyTorch is installed, KeOps can be installed via pip

Could change that to pip install alibi-detect[keops] now.

mauicv avatar Jul 27 '22 16:07 mauicv

@arnaudvl I will wait for us to decide on what to do about Windows and macOS support before reviewing, so that I can review once tests are passing :)

ascillitoe avatar Jul 27 '22 16:07 ascillitoe

~~Just noticed we haven't added keops to the all optional dependency bucket. Is this intentional?~~ (Added keops to all)

mauicv avatar Jul 28 '22 08:07 mauicv

I have skipped the Windows tests since keops does not currently support Windows (https://github.com/getkeops/keops/issues/43). We still need to decide on MacOS...

ascillitoe avatar Jul 28 '22 17:07 ascillitoe

The test_changed_notebooks tests are failing because all notebooks are skipped. This is a known issue: https://github.com/SeldonIO/alibi/issues/647

ascillitoe avatar Jul 29 '22 09:07 ascillitoe

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@ed519e3). Click here to learn what that means. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #548   +/-   ##
=========================================
  Coverage          ?   83.51%           
=========================================
  Files             ?      207           
  Lines             ?    13777           
  Branches          ?        0           
=========================================
  Hits              ?    11506           
  Misses            ?     2271           
  Partials          ?        0           

codecov-commenter avatar Aug 16 '22 15:08 codecov-commenter