icefall Clip rnn gradients in a chunk-wise manner

Clip rnn gradients in a chunk-wise manner

Open yaozengwei opened this issue 3 years ago • 1 comments

This PR aims to clip the rnn gradients in a chunk-wise manner, to solve the gradient explosion problem in the backward pass. When computing each chunk, we clip the gradients of hidden states and cell states which are passed between chunks.

The gradient clipping strategy applied on hidden states and cell states is as follows:

If the gradient norm is lager than a specific threshould, we directly zero the gradients.
Scaling down the gradient by a factor of 0.9.
Limit the gradient norm to a maximum.

Aug 31 '22 03:08 yaozengwei

Now I am running the experiments with following options:

--rnn-clip-grad 1 --rnn-chunk-size 20 --rnn-grad-scale-factor 1.0 --rnn-grad-max-norm 0.5
--rnn-clip-grad 1 --rnn-chunk-size 20 --rnn-grad-scale-factor 1.0 --rnn-grad-max-norm 1.0
--rnn-clip-grad 1 --rnn-chunk-size 20 --rnn-grad-scale-factor 1.0 --rnn-grad-max-norm 2.0

Sep 04 '22 06:09 yaozengwei

icefall icefall copied to clipboard

Clip rnn gradients in a chunk-wise manner

icefall
icefall copied to clipboard