Curious if there's a way to get gradients wrt final instead of initial values for mutating functions

Open akriegman opened this issue 6 months ago • 1 comments

More of something I'm wondering than an issue. I posted this on the mailing list but I'm not sure anyone sees that.

Consider the following example:

double f(double& x) {
  x = x * x;
  return x * x;
}

Here the gradient of f at x = 1 wrt the initial value of x is 4, and the gradient wrt the final value of x is 2. Enzyme calculates the former, but what if I'm interested in the latter? I get the impression that this is doable but there's no api for this in Enzyme. What would it take to hack this on?

I'm still studying the internals of Enzyme and I'll probably find the answer to my question, but it would help me to hear how other people conceptualize this.

The motivation is to play with the idea of a neural network that can modify it's own weights. After the forward pass the initial weights are gone, so what we want to optimize is the final weights. Not sure if this is even worth trying, but I've been thinking about it.

May 30 '25 02:05 akriegman

The way to do it is to not use AD over the entire function, but rather split your function into two parts and use AD o KY over the later one.

To connect this to your NN example, you can just AD over the final state, instead of the whole network

May 30 '25 08:05 vchuravy