backprop icon indicating copy to clipboard operation
backprop copied to clipboard

Jacobians and Op

Open ocramz opened this issue 3 years ago • 4 comments

When dealing with vector-valued functions of vector inputs, e.g. f :: V m -> V n, I'd expect the gradient to be a (n * m) matrix, essentially of a different type than the input of f, but the type of op1 doesn't seem to support this.

What's the idiomatic backprop way of representing this situation ?

ocramz avatar Dec 26 '20 08:12 ocramz

On this note, I don't think the gradient of the softmax function defined in backprop-learn (https://github.com/mstksg/backprop-learn/blob/master/src/Backprop/Learn/Model/Function.hs#L122) does the right thing.

ocramz avatar Dec 26 '20 11:12 ocramz

@mstksg Sorry to bother you but I could really use your input on this, it's been a head scratcher for a few days.

ocramz avatar Jan 01 '21 22:01 ocramz

No worries :) Don't mind a ping too much. I have been meaning to clean up my notifications these days, and I've lost a few of the ones I have been meaning to follow up on.

In terms of a gradient for a V m -> V n, I can't remember if there is a way to access it directly in user-space. But the "official" answer to it (in terms of how it's relevant in things like ANNs) is that it's tracked behind the scenes, because the library is built around computing the gradient with respect to a final single scalar output. So you sort of have to frame your question in terms of the gradient of an "overall" vector-to-scalar function...in practice this means you differentiate a function returning a single loss function scalar.

You could sort of take advantage/fake this by composing it with indexing into a given position, so f x !! 0 (pseudo-code) would be V m -> Double, and so that would give you a V m gradient, or the first row (column?) of the jacobian. And if you did f x !! 1 you'd get the second column, f x !! 2 would give you the third column, etc., and in the end you would backprop several times to populate your entire jacobian matrix. But the library isn't quite designed for this, and such a thing would be kind of wasteful -- it's mostly designed to let you compose V m -> V n with V n -> Double (or other type of functions) to get some final Double result in the end, and the actual jacobians are computed but stored under the hood.

I think in theory the choice to support or not to support vector-valued results is sort of arbitrary, and i think the implementation/math of everything should be able to be generalzed to vector-valued results at the cost of some complexity in the API. (you could implement the current one in terms of it) I think the decision to only allow "singleton" was just to simplify the API because I assumed most people have their "overall" problem be a single loss scalar. But if this isn't the case, the generalization should be straightforward I think.

mstksg avatar Jan 02 '21 05:01 mstksg

Thank you @mstksg for the explanation (and apologies for responding with a mere 7 months of difference! Though I did read this back in January)

ocramz avatar Jul 18 '21 08:07 ocramz