torchdiffeq Initial augmented state on time

Hi,

Thanks for your great work! I am trying to understand why initial augmented state on time is $-\frac{\partial L}{\partial t_1}$, not $\frac{\partial L}{\partial t_1}$. (For me, $\frac{\partial L}{\partial t_1}$ seems to be reasonable as the initial augmented state on time.)

I have checked Algorithm 2 in the original paper, codes in this repository and some codes and documents written by other people, but can't find explanations on the $-\frac{\partial L}{\partial t_1}$.

Could you explain the reason?

Thanks!

similar issue: #199

Feb 10 '23 13:02 rkqzw

Does the conversation from #166 help?

The quantity you're seeing is for computing dL/dt0, which intuitively, has the opposite gradient direction compared to t1, because t0 shortens the integration time interval when t0 is increased. The initial value for this gradient, when t0 = t1, is the negative of dL/dt1.

Mar 06 '23 22:03 rtqichen

Thank you so much for your reply!

Does the conversation from #166 help?

The quantity you're seeing is for computing dL/dt0, which intuitively, has the opposite gradient direction compared to t1, because t0 shortens the integration time interval when t0 is increased. The initial value for this gradient, when t0 = t1, is the negative of dL/dt1.

This helps me to understand the initial value for dL/dt0 calculation includes the negative of dL/dt1, and raises another question. Let me explain it.

L described in Eq(3) in the original paper depends on z(t0) and the integration of f.

$L({\bf z}(t_1)) = L \left( {\bf z}(t_0) + \int_{t_0}^{t_1} f({\bf z}(t), t, \theta) dt \right)$

I think that t1 affects only the integration but t0 affects both z(t0) and the integration, so gradients via z(t0) should be considered in the initial value for dL/dt0 calculation. To my understanding, the conversation from #166 and your reply refer to gradients via the integration only.

Could you explain why the gradients via z(t0) can be ignored?

Thanks!

Mar 10 '23 12:03 rkqzw

@rkqzw I believe its because z(t0) doesn't actually depend on t0: it is the input to the model, which is treated as a constant.

Mar 12 '25 08:03 jason-vega