Initial augmented state on time
Hi,
Thanks for your great work! I am trying to understand why initial augmented state on time is $-\frac{\partial L}{\partial t_1}$, not $\frac{\partial L}{\partial t_1}$. (For me, $\frac{\partial L}{\partial t_1}$ seems to be reasonable as the initial augmented state on time.)
I have checked Algorithm 2 in the original paper, codes in this repository and some codes and documents written by other people, but can't find explanations on the $-\frac{\partial L}{\partial t_1}$.
Could you explain the reason?
Thanks!
similar issue: #199
Does the conversation from #166 help?
The quantity you're seeing is for computing dL/dt0, which intuitively, has the opposite gradient direction compared to t1, because t0 shortens the integration time interval when t0 is increased. The initial value for this gradient, when t0 = t1, is the negative of dL/dt1.
Thank you so much for your reply!
Does the conversation from #166 help?
The quantity you're seeing is for computing dL/dt0, which intuitively, has the opposite gradient direction compared to t1, because t0 shortens the integration time interval when t0 is increased. The initial value for this gradient, when t0 = t1, is the negative of dL/dt1.
This helps me to understand the initial value for dL/dt0 calculation includes the negative of dL/dt1, and raises another question. Let me explain it.
L described in Eq(3) in the original paper depends on z(t0) and the integration of f.
$L({\bf z}(t_1)) = L \left( {\bf z}(t_0) + \int_{t_0}^{t_1} f({\bf z}(t), t, \theta) dt \right)$
I think that t1 affects only the integration but t0 affects both z(t0) and the integration, so gradients via z(t0) should be considered in the initial value for dL/dt0 calculation. To my understanding, the conversation from #166 and your reply refer to gradients via the integration only.
Could you explain why the gradients via z(t0) can be ignored?
Thanks!
@rkqzw I believe its because z(t0) doesn't actually depend on t0: it is the input to the model, which is treated as a constant.