ChainRules.jl icon indicating copy to clipboard operation
ChainRules.jl copied to clipboard

Assume commutative multiplication exactly when necessary

Open sethaxen opened this issue 4 years ago • 4 comments

As noted in #504, there are a number of cases where types of rules were constrained to CommutativeMulNumber where commutation of multiplication did not need to be assumed. Likewise, there were places where commutativity was assumed but not enforced by a type constraint.

This PR fixes #504 by removing constraints where un-needed and adding others where needed. Because the trigonometric, hyperbolic, logarithmic, and exponential function rules all assume cummutativity, this puts constraints on a _large_number of rules. It's possible we don't want to do this, because there are certainly numeric types out there that are real (and therefore commutative) but do not subtype Real, and in this case these would not directly hit our rules. However, the approach taken here is much safer.

sethaxen avatar Oct 13 '21 20:10 sethaxen

I will leave this to @mcabbott to review. and unsubscribe. ping me if i am needed

oxinabox avatar Feb 15 '22 17:02 oxinabox

I propose we explicitly test rules with ::Number types using a non-commutative number. We could use Quaternions.Quaternion, but it lacks features like rand(::Quaternion) (see https://github.com/JuliaGeometry/Quaternions.jl/pull/42). We could pirate in the test suite to add this, or we could roll our own minimal quaternion implementation, like Julia base does (https://github.com/JuliaLang/julia/blob/bb5b98e72a151c41471d8cc14cacb495d647fb7f/test/testhelpers/Quaternions.jl). I think this would be the cleanest option, as if Quaternions adds explicit ChainRules support, that could mask potential issues.

Thoughts, @mcabbott?

sethaxen avatar Feb 16 '22 09:02 sethaxen

Implementing them here (like Base) sounds fine, but is testing with Quaternions going to quadruple the time for tests to run? Would be nice to avoid that if possible.

This closes #275 presume?

Not deep, but I have notation comments:

  • Can I vote for using LinearAlgebra's RealOrComplex instead of the more verbose CommutativeMulNumber everywhere? They should be exactly equivalent, but the short one nobody will have to look up to check.
  • If touching so many lines, I wonder if we ought to standardise letters for sensitivities a little, if not perfectly. I'd like to suggest that Δx be reserved for forward mode, and maybe we should start writing ∇x for reverse, at least for arrays?

(To my eye, vs are usually too small to mark what's a pretty important difference, at least when muddled up in things with NoTangent and so on, big bold words.)

mcabbott avatar Feb 22 '22 00:02 mcabbott

Implementing them here (like Base) sounds fine, but is testing with Quaternions going to quadruple the time for tests to run? Would be nice to avoid that if possible.

I think a bunch of scalar rules should have Quaternion tests, but probably only a few array rules, so I don't expect testing time to increase by that much (but we can check).

This closes #275 presume?

It supersedes #275 (forgot about that one; it's pretty stale now) and closes #504.

* Can I vote for using LinearAlgebra's `RealOrComplex` instead of the more verbose `CommutativeMulNumber` everywhere? They should be exactly equivalent, but the short one nobody will have to look up to check.

I think though that CommutativeMulNumber carries with it the semantic reason why we limit to Real and Complex, which is a safeguard against someone in the future accidentally changing it to be more or less restrictive.

* If touching so many lines, I wonder if we ought to standardise letters for sensitivities a little, if not perfectly. I'd like to suggest that `Δx` be reserved for forward mode, and maybe we should start writing `∇x` for reverse, at least for arrays?

(To my eye, vs are usually too small to mark what's a pretty important difference, at least when muddled up in things with NoTangent and so on, big bold words.)

I agree, the dots and bars are not great (and they're unicode characters often missing from people's devices). I seem to recall at some point we expressly advised people to use Δ for both input tangents and cotangents and for output tangents and cotangents. I'm not opposed to for cotangents, but I'd prefer we discuss that and agree on it outside of this PR. IMO this PR shouldn't be changing notation at all.

sethaxen avatar Feb 22 '22 19:02 sethaxen