torchdiffeq icon indicating copy to clipboard operation
torchdiffeq copied to clipboard

Initial augmented state on time

Open rkqzw opened this issue 2 years ago • 3 comments

Hi,

Thanks for your great work! I am trying to understand why initial augmented state on time is $-\frac{\partial L}{\partial t_1}$, not $\frac{\partial L}{\partial t_1}$. (For me, $\frac{\partial L}{\partial t_1}$ seems to be reasonable as the initial augmented state on time.)

I have checked Algorithm 2 in the original paper, codes in this repository and some codes and documents written by other people, but can't find explanations on the $-\frac{\partial L}{\partial t_1}$.

Could you explain the reason?

Thanks!

similar issue: #199

rkqzw avatar Feb 10 '23 13:02 rkqzw

Does the conversation from #166 help?

The quantity you're seeing is for computing dL/dt0, which intuitively, has the opposite gradient direction compared to t1, because t0 shortens the integration time interval when t0 is increased. The initial value for this gradient, when t0 = t1, is the negative of dL/dt1.

rtqichen avatar Mar 06 '23 22:03 rtqichen

Thank you so much for your reply!

Does the conversation from #166 help?

The quantity you're seeing is for computing dL/dt0, which intuitively, has the opposite gradient direction compared to t1, because t0 shortens the integration time interval when t0 is increased. The initial value for this gradient, when t0 = t1, is the negative of dL/dt1.

This helps me to understand the initial value for dL/dt0 calculation includes the negative of dL/dt1, and raises another question. Let me explain it.

L described in Eq(3) in the original paper depends on z(t0) and the integration of f.

$L({\bf z}(t_1)) = L \left( {\bf z}(t_0) + \int_{t_0}^{t_1} f({\bf z}(t), t, \theta) dt \right)$

I think that t1 affects only the integration but t0 affects both z(t0) and the integration, so gradients via z(t0) should be considered in the initial value for dL/dt0 calculation. To my understanding, the conversation from #166 and your reply refer to gradients via the integration only.

Could you explain why the gradients via z(t0) can be ignored?

Thanks!

rkqzw avatar Mar 10 '23 12:03 rkqzw

@rkqzw I believe its because z(t0) doesn't actually depend on t0: it is the input to the model, which is treated as a constant.

jason-vega avatar Mar 12 '25 08:03 jason-vega