ChainRulesCore.jl icon indicating copy to clipboard operation
ChainRulesCore.jl copied to clipboard

Add examples for `Tangent` and friends

Open mcabbott opened this issue 4 years ago • 6 comments

I was trying to figure out what Tangent and friends do. So I wrote some examples. And I wondered (1) why there isn't a method which accepts structs like Symmetric directly, and (2) where exactly the awkward fact that many LinearAlgebra types have a constructor which takes a Symbol but they store a Char:

julia> Tangent(Symmetric([1 2; 3 4]))
Tangent{Symmetric}(data = [1 2; 3 4], uplo = 'U')

julia> ChainRulesCore.construct(Symmetric, ChainRulesCore.backing(ans))
ERROR: MethodError: no method matching Symmetric(::Matrix{Int64}, ::Char)

Edit -- maybe I understand these points. Have updated this just to display the examples I wish I'd seen.

mcabbott avatar Jul 13 '21 17:07 mcabbott

Codecov Report

Merging #394 (fb98545) into master (4b33290) will decrease coverage by 0.42%. The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #394      +/-   ##
==========================================
- Coverage   89.96%   89.54%   -0.43%     
==========================================
  Files          15       15              
  Lines         638      641       +3     
==========================================
  Hits          574      574              
- Misses         64       67       +3     
Impacted Files Coverage Δ
src/differentials/composite.jl 81.14% <0.00%> (-2.05%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4b33290...fb98545. Read the comment docs.

codecov-commenter avatar Jul 13 '21 17:07 codecov-commenter

Trying to use this on abstract array types is not going to go good places most of the time. Because it is for tuples and structs.

oxinabox avatar Jul 13 '21 19:07 oxinabox

And I wondered (1) why there isn't a method which accepts structs like Symmetric directly,

Because you don't store things that are not part of the derivative in the tangent. Only things that you have derivatives for. So you more or less never need to take all the fields (possible exception is the use in ProjectTo, which we might need to revisit) See use in getproperty. Common is Tangent(foo=ȳ) for getproperty(x, :foo). https://github.com/JuliaDiff/ChainRules.jl/blob/6d6fa9e6a89067fe72c7e11b8d8b3429d4af703c/src/rulesets/LinearAlgebra/factorization.jl#L229-L238

and (2) where exactly the awkward fact that many LinearAlgebra types have a constructor which takes a Symbol but they store a Char:

Also solved by the fact that we don't store fields that are not used. Neither Symbol nor Char will ever have derivatives wrt them, so it isn't a problem for use.

oxinabox avatar Jul 13 '21 19:07 oxinabox

julia> ChainRulesCore.construct(Symmetric, ChainRulesCore.backing(ans))

construct doesn't always work. It doesn't work when the default constructor is not permitted. Its a internal function called from withing + If debug mode is on then + will give a good error message about what you need to do if construct fails. https://github.com/JuliaDiff/ChainRulesCore.jl/blob/master/src/differentials/composite.jl#L290-L304

oxinabox avatar Jul 13 '21 19:07 oxinabox

Ok, trying to route the generic ProjectTo via its Tangent method might be a terrible idea, and can be changed.

Tangent is for structs. Using it on a wrapper array type is pretty weird.

This seems a strange argument to make. They are structs. And the danger of testing things out on examples which you invent for the purpose is that the it's easy to overlook edge cases, because you will overlook them twice. LinearAlgebra is a rich source of weird test cases worth exploiting, if you want this thing to stand up to fairly generic use, as this business of recursing into structs indicates you do.

[To be clear, I don't just mean formal "regression testing", I mean trying things out and trying to find holes and seeing if the thing flies at all. And demonstrating this.]

Because you don't store things that are not part of the derivative in the tangent. Only things that you have derivatives for.

Since we do use say UpperTriangular as one representation of its own gradient, should calling Tangent on it normalise to that representation? Or is there no need for such a path, everything should always accept either?

I can see that construct is the trickier direction to build a path which always works.

mcabbott avatar Jul 13 '21 19:07 mcabbott

. And the danger of testing things out on examples which you invent for the purpose is that the it's easy to overlook edge cases, because you will overlook them twice

For sure, but testing is one thing. Documenting as the only example is another, because it leads people to thinking that is the normal path. Better no examples than ones that show a weird path, better still would be realistic examples.

Since we do use say UpperTriangular as one representation of its own gradient, should calling Tangent on it normalise to that representation? Or is there no need for such a path, everything should always accept either?

Ideally, I think ProjectTo{Tangent{UpperTriangular}} would do the convert. as we have seen that is nontrivial. So it might be that things need to accept both (at least til we can sort that out). We already do have a few such rules. Zygote has many such rules.

They are structs.

In general, with the addition ProjectTo, we are pushing away from the path where we use Tangent for things that are AbstractArrays that already represent a vector space. The other path we could have followed was that things either use Tangent, Number or Array of one of the former.

oxinabox avatar Jul 13 '21 20:07 oxinabox