NVDS Questions about the spatial loss when training？

Questions about the spatial loss when training？

Open onlyinheaven opened this issue 1 year ago • 0 comments

Dear NVDS authors,

Thank you for publishing this outstanding work. However, I have some questions while reading your paper. Since the depth prediction network is fixed during the training of the stabilization network, I would like to understand why there is a spatial loss term L(t-1). According to my understanding, during inference, the stabilization network takes four depth inputs and outputs the depth for the target frame, without explicitly providing the depth for t-1. So, during training, why is there a spatial loss term L(t-1)? Does the stabilization network simultaneously output stabilization depth for all four frames? If not, does it involve inferring t-1 depth twice during each gradient backward pass – once for input t-4 to t-1, producing the depth for t-1, and another for input t-3 to t, producing the depth for t, and then calculating the loss?

Apart from this question, I would also like to understand how the temporal loss during training, which uses t-1 depth, is obtained.

Thank you for your clarification.

Jan 02 '24 17:01 onlyinheaven

NVDS NVDS copied to clipboard

Questions about the spatial loss when training？

NVDS
NVDS copied to clipboard