DistributionsAD.jl icon indicating copy to clipboard operation
DistributionsAD.jl copied to clipboard

Add Enzyme tests

Open devmotion opened this issue 2 years ago • 16 comments

I had tried to run Enzyme tests, and it seems currently quite a few tests fail/error.

devmotion avatar Jul 15 '23 20:07 devmotion

The fact that it doesn't segfault in so many tests is a good sign.

Related: https://github.com/TuringLang/Turing.jl/pull/1887

yebai avatar Jul 17 '23 13:07 yebai

@devmotion can you rerun on latest Enzyme main?

wsmoses avatar Aug 01 '23 18:08 wsmoses

fyi the test failures appear to be due to currently unimplemented derivatives of lapack functions. probably should disable those tests.

wsmoses avatar Aug 05 '23 07:08 wsmoses

@devmotion @yebai at this point, everything in this PR works except:

  • differentiating through Lapack functions is not yet supported by Enzyme
  • your test code uses an active return for a function which returns a vector (this must be duplicated).

Moreover all the errors are relatively nice.

at this point there aren't Enzyme bugs here and merging this is now on your end (presumably marking the code using lapack as unsupported atm, and fixing your test handler to not use an active vector for that input)

wsmoses avatar Aug 07 '23 20:08 wsmoses

Thank you, I'll try to have a look at and fix the remaining issues this week.

devmotion avatar Aug 08 '23 07:08 devmotion

The latest commit added a way to mark some Enzyme tests as broken (and I also introduced a bug in the test setup :smile:) but it also reveals that there are many more issues - so far only a tiny initial testset was run but no tests of the log densities of the uni-, multi- and matrixvariate distributions we're actually interested in.

devmotion avatar Aug 15 '23 21:08 devmotion

@wsmoses Is this an enzyme bug or did I call autodiff incorrectly? https://github.com/TuringLang/DistributionsAD.jl/actions/runs/5872312798/job/15923610267?pr=254#step:6:303

devmotion avatar Aug 16 '23 07:08 devmotion

Definitely seems like a bug (that we need a minimal reproducer for)

On Wed, Aug 16, 2023 at 4:26 PM David Widmann @.***> wrote:

@wsmoses https://github.com/wsmoses Is this an enzyme bug or did I call autodiff incorrectly? https://github.com/TuringLang/DistributionsAD.jl/actions/runs/5872312798/job/15923610267?pr=254#step:6:303

— Reply to this email directly, view it on GitHub https://github.com/TuringLang/DistributionsAD.jl/pull/254#issuecomment-1680100118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXBUFJ5TP4AJUKMV3ADXVRY3HANCNFSM6AAAAAA2LPVOW4 . You are receiving this because you were mentioned.Message ID: @.***>

wsmoses avatar Aug 16 '23 07:08 wsmoses

One slightly annoying thing I found about the Enzyme error message is printing a large amount of LLVM code when compilation fails (see here for an example). This is useful when developing Enzyme (at least in the early stages) but not very helpful for users of Enzyme. Very few people can read such LLVM code and figure out why Enzyme couldn't compile. They also add clutter to CI logs. I suggest disabling such LLVM messages by default but providing an API for developers to activate them when needed.

yebai avatar Aug 16 '23 09:08 yebai

@devmotion did you have a chance to isolate that 1.6 error that crashes Julia (so we can ensure it does not)?

wsmoses avatar Sep 14 '23 19:09 wsmoses

@devmotion can you repush (its not letting me rerun tests for some reason)

wsmoses avatar Sep 22 '23 16:09 wsmoses

@devmotion hm I still can't retrigger this, if you can have the tests run again.

wsmoses avatar Oct 11 '23 04:10 wsmoses

@wsmoses you should have the priveledges to trigger CI (re-)runs -- can you try again?

yebai avatar Oct 22 '23 15:10 yebai

I reran CI against the latest Enzyme.jl v0.12.19. They are now failing with either (mac os)

Assertion failed: (vec.back() == currentBlock), function addReverseBlock, file /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp, line 8952.

or (ubuntu)

 julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:8952: llvm::BasicBlock* GradientUtils::addReverseBlock(llvm::BasicBlock*, const llvm::Twine&, bool, bool): Assertion `vec.back() == currentBlock' failed.

See the failed CI runs for details.

@wsmoses, is this an issue you are aware of?

mhauru avatar Jun 26 '24 09:06 mhauru

Nope not aware of please open MWE

wsmoses avatar Jun 26 '24 13:06 wsmoses

Gathering links to some related Enzyme issues: https://github.com/EnzymeAD/Enzyme.jl/issues/1585 https://github.com/EnzymeAD/Enzyme.jl/issues/1593 https://github.com/EnzymeAD/Enzyme.jl/issues/1594 https://github.com/EnzymeAD/Enzyme.jl/issues/1595 https://github.com/EnzymeAD/Enzyme.jl/issues/1596

mhauru avatar Jul 02 '24 09:07 mhauru

@yebai Looks like the issues here are that Enzyme actually passes tests it was marked as failing?

wsmoses avatar Jul 09 '24 14:07 wsmoses

yes, most tests are passing with most recent Enzyme.

yebai avatar Jul 09 '24 14:07 yebai

@devmotion do you know why we are testing Enzyme only on Others tests?

yebai avatar Jul 09 '24 15:07 yebai

I think Enzyme might still struggle with a few MvNomal tests, which was revealed after 8b9fa77

yebai avatar Jul 09 '24 22:07 yebai

open issues and we'll get those sorted!

wsmoses avatar Jul 10 '24 01:07 wsmoses

Oh yeah runtime activity isn't supported yet for forward mode blas, so my recommendation here is to turn on runtime activity more selectively for which tests need it

wsmoses avatar Jul 10 '24 01:07 wsmoses

Some open issues that are causing test failures here. Some might just need marking as broken, others seem more in need of fixing:

  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1620 (marked as broken)
  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1603 (marked as broken)
  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1601
  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1599
  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1649
  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1629
  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1626
  • ~~https://github.com/EnzymeAD/Enzyme.jl/issues/1623~~
    • Likely a Julia compiler bug
  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1621
  • [ ] https://github.com/EnzymeAD/Enzyme.jl/issues/1696

See also:

  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1625
  • [x] https://github.com/EnzymeAD/Enzyme.jl/issues/1634

I was in the process of checking if I could find more issues in this test suite, but got distracted by https://github.com/TuringLang/DynamicPPL.jl/issues/643 and related issues. Will get back to this once I'm done minimising what I've found there.

mhauru avatar Jul 10 '24 09:07 mhauru

I marked distributions that involve ccalls without rules as broken so CI won't fail. Enzyme seem to run in most cases but could return incorrect gradients some distribution density functions.

yebai avatar Aug 21 '24 20:08 yebai

Cool, open an issue for whatever is wrong and we’ll get it fixed

wsmoses avatar Aug 21 '24 20:08 wsmoses

Draft PR for doing the reverse, namely adding Distributions.jl integration tests into Enzyme: https://github.com/EnzymeAD/Enzyme.jl/pull/1819

mhauru avatar Sep 13 '24 10:09 mhauru

We're going to need individual issues for any failures. The PR above doesn't presently work as it hits a version issue with QuadGK.

wsmoses avatar Sep 15 '24 04:09 wsmoses

Closed in favour of https://github.com/EnzymeAD/Enzyme.jl/pull/1819

yebai avatar Oct 03 '24 20:10 yebai