Automatic_Speech_Recognition
Automatic_Speech_Recognition copied to clipboard
BN is suggested to be applied immediately before RELU, not after.
https://github.com/zzw922cn/Automatic_Speech_Recognition/blob/545a1981dbc705d6f8312650a9d5a290ee065f8a/models/deepSpeech2.py#L73
As in (Laurent et al., 2015), there are two ways of applying BatchNorm to the recurrent operation. A natural extension is to insert a BatchNorm transformation, B(), immediately before every non-linearity as follows: h[l, t] = f(B(W[l]*h[l-1, t] + U[l]*h[l, t-1]))
In this case the mean and variance statistics are accumulated over a single time-step of the minibatch. We did not find this to be effective. An alternative (sequence-wise normalization) is to batch normalize only the vertical connections. The recurrent computation is given by h[l, t] = f(B(W[l]*h[l-1, t]) + U[l]*h[l, t-1])
So should we set activation
of rnn_cell
to None
and move RELU activation immediately after BN?
@chenjiasheng hi, thanks for your comment, i'll check it later. welcome to pull requests.
@zzw922cn I wish to make PRs, but can not, for the company's proxy and upload limit reasons. Sorry.