warp icon indicating copy to clipboard operation
warp copied to clipboard

Question about adjoint

Open supertan0204 opened this issue 1 year ago • 3 comments

There is a problem when I run the following example:

import warp as wp
wp.init()

@wp.kernel
def test(x: wp.array(dtype=float), y: wp.array(dtype=wp.vec3)):
    tid = wp.tid()
    y[tid].x = x[tid] * 2.
    y[tid].y = x[tid] * 3.
    y[tid].z = x[tid] * 4.

# initialize
x = wp.array([1., 1., 1.], dtype=float, requires_grad=True)
y = wp.array([wp.vec3(0.,0.,0.),wp.vec3(0.,0.,0.),wp.vec3(0.,0.,0.)], dtype=wp.vec3, requires_grad=True)

# set initial gradient
y.grad = wp.array([wp.vec3(1.,1.,1.),wp.vec3(1.,1.,1.),wp.vec3(1.,1.,1.)], dtype=wp.vec3)

wp.launch( 
    kernel=test,
    dim=3,
    inputs=[x, y],
    adjoint=True,
    adj_inputs=[None, None]
)

print(x.grad)

In above code, I write a simple test kernel to map a wp.array with float data type to a wp.array with wp.vec3 data type. The reason I test this kernel is that I want to test whether Warp can be used to compute some high-dimensional tensor derivative (in this example, $d\mathbf{y}/d\mathbf{x}$, where $\mathbf{y}\in\mathbb{R}^{3\times3}$ and $\mathbf{x}\in\mathbb{R}^3$). However, the output just gives:

[0. 0. 0.]

I assume the grad should at least be non-zero, but it wasn't... So does Warp kernel's adjoint support high-dimentional tensor derivative like this?

supertan0204 avatar Jul 03 '24 05:07 supertan0204

Thanks for your post @supertan0204 . We have an autograd bug associated with complex types like vec3s that we're currently working on. Should be fixed in our next release, and that should resolve this issue.

daedalus5 avatar Jul 03 '24 14:07 daedalus5

Thanks for your reply @daedalus5, just a simple follow-up question: what would x.grad ideally be?

supertan0204 avatar Jul 03 '24 16:07 supertan0204

In the example you provided, it should be [9., 9., 9.] . The adjoint for each element in x is the sum adj_x[i] = 2.0 * adj_y[i].x + 3.0 * adj_y[i].y + 4.0 * adj_y[i].z

daedalus5 avatar Jul 03 '24 17:07 daedalus5

This issue still occurs with the latest warp version. Could we reopen the issue or look into this? Thanks! @daedalus5

sizhe-li avatar Feb 18 '25 20:02 sizhe-li

@sizhe-li , as of Warp 1.5, the following re-write should work:

import warp as wp
wp.init()

@wp.kernel
def test(x: wp.array(dtype=float), y: wp.array(dtype=wp.vec3)):
    tid = wp.tid()
    a = wp.vec3()
    a.x = x[tid] * 2.
    a.y = x[tid] * 3.
    a.z = x[tid] * 4.
    y[tid] = a

# initialize
x = wp.array([1., 1., 1.], dtype=float, requires_grad=True)
y = wp.array([wp.vec3(0.,0.,0.),wp.vec3(0.,0.,0.),wp.vec3(0.,0.,0.)], dtype=wp.vec3, requires_grad=True)

# set initial gradient
y.grad = wp.array([wp.vec3(1.,1.,1.),wp.vec3(1.,1.,1.),wp.vec3(1.,1.,1.)], dtype=wp.vec3)

wp.launch(
    kernel=test,
    dim=3,
    inputs=[x, y],
    adjoint=True,
    adj_inputs=[None, None]
)

print(x.grad)

We don't yet have back-propagation working for the example that was originally posted, but it's on our radar. Would this sort of workaround be sufficient for what you're working on?

daedalus5 avatar Feb 18 '25 20:02 daedalus5

@daedalus5 Thank you so much for the quick reply! I am trying to produce a minimal example and be helpful. I upgraded from 0.6.1 to 1.5.0. The same code (an implementation of the material point method), which functioned correctly before, now returns zero gradients...

Do you have pointers on what the culprits might be generally? It is a large codebase so I am trying to come up with a minimal example. Thank you again!

sizhe-li avatar Feb 18 '25 22:02 sizhe-li

There are some obscure uses of a local tape..https://github.com/PingchuanMa/NCLaw/blob/ff4e27a5dfdaa0b34687a0bf8292af9fb0ada8b0/nclaw/sim/mpm.py#L295-L315

I don't know if the newer version conflicts with these "hacks"?

The design of pytorch - warp gradient communication is in this file.

Update: starting with version 0.10.1 my gradients become 0, it is good up until version 0.9

Second Update: rewriting things into adjoint form instead of using tape fixed the problem...

sizhe-li avatar Feb 18 '25 22:02 sizhe-li

Yes was going to stay I'm not sure if your tape setup will work or not. You have two tapes, which in theory is okay, but in the link you shared I only see launches being written to the local_tape(). In general I'd advise sticking to one tape. If you just need the tape to calculate a gradient, then launching the adjoint kernel directly is fine.

daedalus5 avatar Feb 19 '25 18:02 daedalus5