ReverseDiffSource.jl icon indicating copy to clipboard operation
ReverseDiffSource.jl copied to clipboard

Tensor Functions

Open rleegates opened this issue 10 years ago • 7 comments

Hi Frédéric,

as far as I can tell, the package currently only supports functions that yield a scalar value. Any plans on extending this to tensor-valued functions of tensors?

Best regards, Robert

rleegates avatar Feb 20 '15 22:02 rleegates

Hi,

I had no plans for this but yes this is possible. Calculation time should be O(n). It would need some prior thinking on how to present the results, especially for higher order derivations. I labelled your issue as an enhancement request (not sure I'll have time for this though).

Don't know if this would work for you, but you have a workaround with the current version :

  • end your expression with a f[i] (fbeing the variable containing the tensor)
  • generate your derivation expression with rdiff :
  • build the loop around the expression : for i in 1:n ; $expr : end

fredo-dedup avatar Feb 26 '15 17:02 fredo-dedup

Hi Frédéric,

thank you for your quick reply. I'll try your workaround when I find the time, as I'm currently involved in another project. As far as I can tell, the workaround provides for the differentiation of tensor-valued functions with respect to scalars, however, I'm sure it could be extended to more complicated structures. Just FYI, what I'd ideally be looking for is the computation of partial and/or total derivatives of functionals of the type f_{ij}(g_{ij}(x_{ij}), k(x_{ij})) such that its derivative wrt x yields d/dx_{kl} f_{ij} = df_{ij}/dg_{mn} dg_{mn}/dx_{kl} + df_{ij}/dk dk/dx_{kl} in which either contractions (first term) or dyadics (second term) appear. I was pondering doing this symbolically, however your package would be a nice alternative, as it would enable me to skip the code generation from the symbolic expression. In addition, my use-cases become even more complicated when the tensor function is applied to the eigenvalues of x_{ij}, a point where I'd be unsure if symbolic computations will suffice. If I can be of help in implementing such features, we could continue this discussion by email.

Best regards, Robert

rleegates avatar Feb 27 '15 00:02 rleegates

FYI, if just writing down all the gradients of lots of tensor-valued functions is the blocker, this has been done (at least twice) in the autograd-family of libraries.

In autograd: https://github.com/HIPS/autograd/blob/master/autograd/numpy/numpy_grads.py In the Torch version of autograd: https://github.com/twitter/torch-autograd/blob/master/src/gradfuns.lua

EDIT: the most confusing gradients for those exhaustive links above are those having to do with tensor resizing, indexing and broadcasting. I'm happy to help, and walk through the code with anyone interested in porting them to Julia, if that's interesting to someone.

alexbw avatar Jul 24 '16 15:07 alexbw

@alexbw I'm definitely interested to port tensor gradients to Julia (e.g. see dfdx/Espresso.jl#2 for some details). Would you suggest any "entry point" to get started (either in code or in theoretical papers)?

dfdx avatar Aug 14 '16 16:08 dfdx

On the issue you linked, I think you're conflating the partial derivatives you need to write with the method you will use to perform automatic differentiation of output w.r.t. input. Indeed we do require functions to have scalar output in torch-autograd, but I believe autograd supports calculation of the Jacobian (non-scalar output) by doing multiple passes of the function, once per column of the Jacobian. So, if you get scalar-valued outputs working, you just need some small extra effort to get tensor-valued outputs.

I would recommend just lifting the gradients from autograd or torch-autograd. In autograd the file is called "numpy_grads.py", I believe, and it's "gradfuns.lua" in torch-autograd.

On Sun, Aug 14, 2016 at 12:29 PM Andrei Zhabinski [email protected] wrote:

@alexbw https://github.com/alexbw I'm definitely interested to port tensor gradients to Julia (e.g. see dfdx/Espresso.jl#2 https://github.com/dfdx/Espresso.jl/issues/2 for some details). Would you suggest any "entry point" to get started (either in code or in theoretical papers)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaDiff/ReverseDiffSource.jl/issues/11#issuecomment-239682674, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJ4j2eWMtju55jQZH_AOgCZaaWuqPWmks5qf0JJgaJpZM4DjlTK .

alexbw avatar Aug 15 '16 12:08 alexbw

@alexbw Thanks for your answer. For the code you linked, am I right saying that gradients there are represented as Python/Lua functions that take previous gradients (i.e. gradients of arguments) and produce a new gradient for the current operation itself? That is something like this:

grad_1 = make_gradient_myfunc(A, B)
grad_1(already_computed_gradients_of_A_and_B)

Also I don't really understand meaning of so common unbroadcast function there. I see that it sums out some dimensions of a tensor, but which ones and for what purpose?

dfdx avatar Aug 16 '16 23:08 dfdx

Yes, the function signature, for some function like e.g. sum(x,y):

gradSum[1] = function(incomingGradient, answerOfSum, x, y) ... end gradSum[2] = function(incomingGradient, answerOfSum, x, y) ... end

to calculate the partial gradients for each argument of sum(x,y)

Unbroadcast (used to be called "sumToMatchShape") is used a lot to match gradient shapes when there has been replication. If you replicate a tensor in the forward pass, the action you must take in the backwards pass is to sum (not select) the replicated parts together.

On Tue, Aug 16, 2016 at 7:01 PM Andrei Zhabinski [email protected] wrote:

@alexbw https://github.com/alexbw Thanks for your answer. For the code you linked, am I right saying that gradients there are represented as Python/Lua functions that take previous gradients (i.e. gradients of arguments) and produce a new gradient for the current operation itself? That is something like this:

grad_1 = make_gradient_myfunc(A, B) grad_1(already_computed_gradients_of_A_and_B)

Also I don't really understand meaning of so common unbroadcast function there. I see that it sums out some dimensions of a tensor, but which ones and for what purpose?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaDiff/ReverseDiffSource.jl/issues/11#issuecomment-240265873, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJ4jzgvByusr7z3gxK9GO5AQW3MbtrIks5qgkFNgaJpZM4DjlTK .

alexbw avatar Aug 22 '16 14:08 alexbw