icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Clip rnn gradients in a chunk-wise manner

Open yaozengwei opened this issue 3 years ago • 1 comments

This PR aims to clip the rnn gradients in a chunk-wise manner, to solve the gradient explosion problem in the backward pass. When computing each chunk, we clip the gradients of hidden states and cell states which are passed between chunks.

The gradient clipping strategy applied on hidden states and cell states is as follows:

  1. If the gradient norm is lager than a specific threshould, we directly zero the gradients.
  2. Scaling down the gradient by a factor of 0.9.
  3. Limit the gradient norm to a maximum.

yaozengwei avatar Aug 31 '22 03:08 yaozengwei

Now I am running the experiments with following options:

  • --rnn-clip-grad 1 --rnn-chunk-size 20 --rnn-grad-scale-factor 1.0 --rnn-grad-max-norm 0.5
  • --rnn-clip-grad 1 --rnn-chunk-size 20 --rnn-grad-scale-factor 1.0 --rnn-grad-max-norm 1.0
  • --rnn-clip-grad 1 --rnn-chunk-size 20 --rnn-grad-scale-factor 1.0 --rnn-grad-max-norm 2.0

yaozengwei avatar Sep 04 '22 06:09 yaozengwei