Enzyme.jl icon indicating copy to clipboard operation
Enzyme.jl copied to clipboard

Fix rules for `cholesky` and `ldiv!` on `Cholesky`

Open simsurace opened this issue 1 year ago • 59 comments

I had a suspicion that the existing rule was somehow off. I managed to break it with something like this:

function f(A)
    C = cholesky(A * A')
    return sum(abs2, C.L * C.U)
end
A = rand(3, 3)
test_reverse(f, Active, (A, Duplicated))

Working on a fix...

simsurace avatar Feb 25 '24 00:02 simsurace

I went back to ChainRules.jl and copied the rule from there. As noted previously, the rules looked completely different before the change, with the Enzyme rule just accumulating the factors field of the cotangent, while the ChainRules rule did a sequence of linear algebra operations. Now with the change the new test passes, but some of the old tests are broken. @michel2323 could you please double check whether the tests you wrote are correct?

simsurace avatar Feb 25 '24 00:02 simsurace

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 130 lines in your changes missing coverage. Please review.

Project coverage is 65.76%. Comparing base (ff9d320) to head (b2e922a). Report is 12 commits behind head on main.

Files Patch % Lines
src/internal_rules.jl 0.00% 130 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

:exclamation: There is a different number of reports uploaded between BASE (ff9d320) and HEAD (b2e922a). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (ff9d320) HEAD (b2e922a)
2 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1307       +/-   ##
===========================================
- Coverage   96.30%   65.76%   -30.54%     
===========================================
  Files           8       31       +23     
  Lines         406    12383    +11977     
===========================================
+ Hits          391     8144     +7753     
- Misses         15     4239     +4224     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 25 '24 00:02 codecov-commenter

So the reverse rule for ldiv! for Cholesky was also wrong. I was able to fix that. Now only the forward rule needs to be fixed, I think that's why the test comparing forward and reverse results currently fail. EDIT: I'm not sure as I don't fully understand the current tests. Maybe it's better to replace by EnzymeTestUtils tests?

simsurace avatar Feb 26 '24 11:02 simsurace

@michel2323 I think the rules are now fixed. I hope you did not mind me reformatting your tests. @wsmoses if @michel2323 does not respond, we might need another reviewer. Maybe @sethaxen could also help with this rule?

simsurace avatar Feb 26 '24 15:02 simsurace

@michel2323 I think the rules are now fixed. I hope you did not mind me reformatting your tests. @wsmoses if @michel2323 does not respond, we might need another reviewer. Maybe @sethaxen could also help with this rule?

Go ahead. I went over it now. Looks good to me. I never could fully match the ChainRules implementation to this paper. But that may just be me. Thanks for taking a look at this!

michel2323 avatar Feb 27 '24 01:02 michel2323

I did not look at the paper in detail. Maybe their method is faster than the current implementation, which matches ChainRules.jl. We can always improve things later, in my opinion it's more important to merge a correct rule now. Maybe some of these rules will even become obsolete when Enzyme capabilities improve to also be able to autodiff through LAPACK routines. Thanks for taking a look!

simsurace avatar Feb 27 '24 06:02 simsurace

@wsmoses, I tried to follow your guidance on the fact, let me know if I got it right. I just noticed that I did not check the forward rule in the regression test, and in fact it also needs to be changed. Please let me fix that before merging so we get everything right this time.

simsurace avatar Feb 27 '24 06:02 simsurace

Ok, the additional forward unit tests for cholesky all fail, documenting there's still something wrong with the forward rule.

simsurace avatar Feb 27 '24 07:02 simsurace

I noticed there are still errors. Please don't merge just yet.

simsurace avatar Mar 02 '24 23:03 simsurace

Now we should probably refactor to target cholesky! instead of cholesky.

simsurace avatar Mar 02 '24 23:03 simsurace

Agreed, also I believe this rule assumes NoPivot, so you probably want to just target this method: https://github.com/JuliaLang/julia/blob/688ae0a58361889feedcff9eeb5ff1e1b710906b/stdlib/LinearAlgebra/src/cholesky.jl#L267-L271. You perhaps can expand the wrapped type from Matrix to StridedMatrix, and you could expand to RealHermSymComplexHerm.

sethaxen avatar Mar 04 '24 18:03 sethaxen

The issue is that currently this fails, so the easiest would be to write two rules, one for cholesky and one for cholesky!, but I'd rather first merge this in a correct state before further expanding it. Without some better tooling the effort of writing these rules is not sustainable for me.

simsurace avatar Mar 04 '24 18:03 simsurace

The in place reverse mode cholesky rule is going to be harder because the out of place one from Michel makes an assumption about the aliasing usage of the shadow (in essence it stuffs the derivative somewhere it shouldn't be, then uses it to set the correct result in a later stage).

This should be fixed to just do the proper thing, but FYI.

@sethaxen would you have cycles to work with @simsurace to set up the tooling they (and others) need?

wsmoses avatar Mar 04 '24 18:03 wsmoses

And yeah @simsurace you boldly (and successfully) started with some of the most difficult rules to do -- which have been left as todos for this reason!

wsmoses avatar Mar 04 '24 18:03 wsmoses

The in place reverse mode cholesky rule is going to be harder because the out of place one from Michel makes an assumption about the aliasing usage of the shadow (in essence it stuffs the derivative somewhere it shouldn't be, then uses it to set the correct result in a later stage).

Is this still true of this PR though? I pretty much changed every line by now. Could you indicate the questionable lines?

simsurace avatar Mar 04 '24 18:03 simsurace

Without some better tooling the effort of writing these rules is not sustainable for me.

@sethaxen would you have cycles to work with @simsurace to set up the tooling they (and others) need?

Spare cycles are a bit rare right now, but maybe I can do something. Which tooling is needed?

The issue is that currently this fails, so the easiest would be to write two rules, one for cholesky and one for cholesky!,

I didn't quite follow, could you explain why two rules are needed? I think cholesky for the base types effectively just copies and then calls cholesky!.

sethaxen avatar Mar 04 '24 19:03 sethaxen

See #1320, i.e. writing a rule for cholesky! won't be sufficient because the stuff in cholcopy fails to differentiate.

simsurace avatar Mar 05 '24 06:03 simsurace

I would suggest merging this now, keeping the focus to the correctness, and making the cholesky! rule in another PR.

As for tooling, two things would be helpful in my view: 1) making it so EnzymeTestUtils can be used to completely check the correctness of any rules, including those with wrapper types etc. (this means fixing some of the issues which are inherited from FiniteDifferences such as the treatment of to_vec). 2) developing some tooling to reduce the boilerplate in rules, i.e. dealing with caches, which seem to be pretty independent of the actual mathematical content, such that rule writers can focus more on the latter. As a side benefit, I think the rules will also get more readable and more uniformly styled.

simsurace avatar Mar 05 '24 07:03 simsurace

See #1320, i.e. writing a rule for cholesky! won't be sufficient because the stuff in cholcopy fails to differentiate.

Seems like that's an Enzyme issue that would need to be fixed either way.

As for tooling, two things would be helpful in my view: 1) making it so EnzymeTestUtils can be used to completely check the correctness of any rules, including those with wrapper types etc. (this means fixing some of the issues which are inherited from FiniteDifferences such as the treatment of to_vec).

I'll take a pass at this. It might be easier than I'm thinking.

  1. developing some tooling to reduce the boilerplate in rules, i.e. dealing with caches, which seem to be pretty independent of the actual mathematical content, such that rule writers can focus more on the latter. As a side benefit, I think the rules will also get more readable and more uniformly styled.

I agree this would be great, but I unfortunately cannot help with this. I suspect doing this correctly would require being able to identify what intermediates need to be cached and the best way to do this just by code analysis, which actually seems like a pretty complex problem to solve.

sethaxen avatar Mar 05 '24 09:03 sethaxen

FWIW, I think the safest way to go is to only define the rules only for LAPACK.potrf!, which are necessary until Enzyme can differentiate through LAPACK code. I don't much like covering up https://github.com/JuliaLang/julia/blob/892c49162a48e9646dc3f0f2b94ebb33ae2a76be/stdlib/LinearAlgebra/src/cholesky.jl#L196-L251 for the following reasons:

  1. It shouldn't be necessary. Enzyme should be able to AD through those methods.
  2. The "right" way to implement a Cholesky rule is to by hand differentiate the same path the primal takes. The primal function makes no assumption of the primal matrix being Hermitian; failure is an allowed option (and does not error if check=false). The rules that use linear algebra assume no failure and so will be wrong if the Cholesky factorization failed. For LAPACK we have no choice, but if we had the option to differentiate the LAPACK code directly, I would always recommend that.

sethaxen avatar Mar 05 '24 09:03 sethaxen

See #1320, i.e. writing a rule for cholesky! won't be sufficient because the stuff in cholcopy fails to differentiate.

Seems like that's an Enzyme issue that would need to be fixed either way.

As for tooling, two things would be helpful in my view: 1) making it so EnzymeTestUtils can be used to completely check the correctness of any rules, including those with wrapper types etc. (this means fixing some of the issues which are inherited from FiniteDifferences such as the treatment of to_vec).

I'll take a pass at this. It might be easier than I'm thinking.

  1. developing some tooling to reduce the boilerplate in rules, i.e. dealing with caches, which seem to be pretty independent of the actual mathematical content, such that rule writers can focus more on the latter. As a side benefit, I think the rules will also get more readable and more uniformly styled.

I agree this would be great, but I unfortunately cannot help with this. I suspect doing this correctly would require being able to identify what intermediates need to be cached and the best way to do this just by code analysis, which actually seems like a pretty complex problem to solve.

Ok, I do not expect anyone in particular to do this, but it would definitely make life easier.

simsurace avatar Mar 05 '24 14:03 simsurace

FWIW, I think the safest way to go is to only define the rules only for LAPACK.potrf!, which are necessary until Enzyme can differentiate through LAPACK code. I don't much like covering up https://github.com/JuliaLang/julia/blob/892c49162a48e9646dc3f0f2b94ebb33ae2a76be/stdlib/LinearAlgebra/src/cholesky.jl#L196-L251 for the following reasons:

1. It shouldn't be necessary. Enzyme should be able to AD through those methods.

2. The "right" way to implement a Cholesky rule is to by hand differentiate the same path the primal takes. The primal function makes no assumption of the primal matrix being Hermitian; failure is an allowed option (and does not error if `check=false`). The rules that use linear algebra assume no failure and so will be wrong if the Cholesky factorization failed. For LAPACK we have no choice, but if we had the option to differentiate the LAPACK code directly, I would always recommend that.

I agree, but I think there is a tradeoff and I'd rather merge this rule if it is correct, because it unlocks many applications and can boost the adoption of Enzyme.

I would rather prefer not having to write more rules for LAPACK routines, but I have no idea how to make it so Enzyme can differentiate them natively, nor how difficult that is.

simsurace avatar Mar 05 '24 14:03 simsurace

I agree that since this fixes a correctness issue, it should be merged soon. But I also think that after the cholmod issue is fixed, it should be replaced with a mutating version, ideally one that acts just on potrf!. I'll try to review this evening.

I have a functioning to_vec implementation in #1327

sethaxen avatar Mar 05 '24 16:03 sethaxen

Please add your questions as comments to specific lines and I'll try to answer to the best of my knowledge.

Edit: I won't be able to really explain _cholesky_pullback_shared_code beyond that's the ChainRules reference implementation which is battle tested and should apply directly to here.

simsurace avatar Mar 05 '24 16:03 simsurace

@sethaxen I wrote some tests from scratch using EnzymeTestUtils. They have the known limitations. I'm not sure that is better than the previous situation. We still have some disagreements about the forward rules, and no test to decide whether the implementation is correct. The current implementation however seems to give the right results in all applications I've tried. I hope we can find a way to move forward with this.

simsurace avatar Mar 17 '24 21:03 simsurace

@simsurace EnzymeTestUtils v0.1.7 removes most of the limitations of the testing functions, so that should allow some of the tests here to be simplified (i.e. no problem with Hermitian arguments and Cholesky outputs). This week I'll look closely at the points we disagreed on.

sethaxen avatar Apr 21 '24 07:04 sethaxen

That's great, looking forward to it.

simsurace avatar Apr 21 '24 09:04 simsurace

I just tested this PR locally with the following, and both forward- and reverse-mode tests fail:

julia> @testset "cholesky" begin
           @testset for Te in (Float64,), TS in (Symmetric, Hermitian)
               @testset for TA in (Const, Duplicated), Tret in (Const, Duplicated)
       
                   A = exp(TS(rand(Te, 4, 4)))
                   are_activities_compatible(Tret, TA) || continue
                   test_forward(cholesky, Tret, (A, TA))
                   test_reverse(cholesky, Tret, (A, TA))
               end
           end
       end

sethaxen avatar Apr 22 '24 15:04 sethaxen

I'll look into it

simsurace avatar Apr 22 '24 18:04 simsurace

Ok, I was able to fix the tests suggested above. I removed all previous tests and made new tests entirely with EnzymeTestUtils. Now, the remaining failures are from the forward rule of ldiv! on Cholesky, which I hadn't touched yet. I will address this later.

simsurace avatar Apr 27 '24 23:04 simsurace