recurrent-visual-attention icon indicating copy to clipboard operation
recurrent-visual-attention copied to clipboard

Detaching l_t

Open Pozimek opened this issue 5 years ago • 4 comments

At the moment the location tensor l_t is never detached from the computational graph in spite of both being produced by and 'consumed' by trainable modules. As far as I understand the code this enables the gradients to 'backpropagate through time' in a way that the authors of RAM did not intend: the gradients that originated in the action_network and reached the fc2 layer inside the glimpse network would travel back to the previous timestep's location_network and alter its weights and only stop once they reach the detached RNN memory vector h_t. As far as I understand the authors intended the location_network to only be trained using reinforcement learning.

This could be a bug or it could be an accidental improvement to the network; either way please let me know if my understanding is correct in here as I am still learning Pytorch and my project is heavily reliant on your code :)

Pozimek avatar Jan 21 '20 18:01 Pozimek

Yes agree. Same confusion. The author says:The location network is always trained with REINFORCE. So should we build another loss function?

yxiao54 avatar Mar 25 '20 00:03 yxiao54

Note aside from stop at h_t, the gradient originated from action_network will continue recursively through g_t in core_network to modify all previous time l_t. Meanwhile, I wonder why location_network and baseline_network have to detach from h_t? Anywhere in the paper suggested core_network is only trained via classification loss? @Pozimek @yxiao54

lijiangguo avatar Apr 11 '20 15:04 lijiangguo

@Pozimek it seems that l_t is detached in location network

litingfeng avatar Dec 30 '20 03:12 litingfeng

@Pozimek Hi, you explanation helps me understand why the authors use l_t.detach() in the code, thanks!

lizhenstat avatar Jan 07 '21 07:01 lizhenstat