Automatic_Speech_Recognition icon indicating copy to clipboard operation
Automatic_Speech_Recognition copied to clipboard

BN is suggested to be applied immediately before RELU, not after.

Open chenjiasheng opened this issue 7 years ago • 2 comments

https://github.com/zzw922cn/Automatic_Speech_Recognition/blob/545a1981dbc705d6f8312650a9d5a290ee065f8a/models/deepSpeech2.py#L73

As in (Laurent et al., 2015), there are two ways of applying BatchNorm to the recurrent operation. A natural extension is to insert a BatchNorm transformation, B(), immediately before every non-linearity as follows: h[l, t] = f(B(W[l]*h[l-1, t] + U[l]*h[l, t-1]))

In this case the mean and variance statistics are accumulated over a single time-step of the minibatch. We did not find this to be effective. An alternative (sequence-wise normalization) is to batch normalize only the vertical connections. The recurrent computation is given by h[l, t] = f(B(W[l]*h[l-1, t]) + U[l]*h[l, t-1])

So should we set activation of rnn_cell to None and move RELU activation immediately after BN?

chenjiasheng avatar Aug 30 '17 08:08 chenjiasheng

@chenjiasheng hi, thanks for your comment, i'll check it later. welcome to pull requests.

zzw922cn avatar Aug 31 '17 01:08 zzw922cn

@zzw922cn I wish to make PRs, but can not, for the company's proxy and upload limit reasons. Sorry.

chenjiasheng avatar Sep 01 '17 02:09 chenjiasheng