chainer-char-rnn icon indicating copy to clipboard operation
chainer-char-rnn copied to clipboard

Question about unchain_backward

Open gkeskin07 opened this issue 9 years ago • 0 comments

Hi,

Thanks a lot for your example of char_rnn. I have a question about your usage of unchain_backward (line 107) in the train.py. Assume batchsize is 1, and the code does backprop every bprop_len steps. Assume we are at the timestep bprop_len (first time backpropagation is performed).

  1. When you do backprop at step bprop_len, which gradients are computed? Do you only compute the gradient for output at step brop_len with respect to all inputs from step 1 to step brop_len? Or do you compute the gradient of each output from timestep 1 to timestep brop_len with respect to all the inputs from timestep 1 to timestep brop_len? Ideally, it should be the latter, but I am a bit confused about how this is achieved.

  2. When you unchain_backward at step brop_len, does all the input/output history get erased before step brop_len ? Specifically, when we do the second backprop at time step 2*brop_len, do we have access to the inputs at time point (bprop_len-1)? The input at (brop_len - 1) would be important to calculate the gradient of output brop_len+1, so that the backprop could be made.

Thanks

gkeskin07 avatar Nov 02 '15 19:11 gkeskin07