ba-dls-deepspeech
ba-dls-deepspeech copied to clipboard
Different decode results when decode batch_size=1 and >1
Found decode results difference with same utt when decode batch size =1 and batch_size=16
When decode batch size =1, the argmax of output of the network likes this:
blank C C blank A B B Z D blank A blank blank blank T T blank
using the arg_max, the result will be: cabzdat
but the ground truth is: cat
While when I use batch_size=16 to decode the same utt(there are more than 2 utts in the test json), then the result will be just "cat".
Why would it happen?
Many thanks Xin.q.
This may be due to the batch-normalization layers. Could you retrain another network without batch-normalization? keras is now 2.0 and doesn't support the mode
flag anymore. You could also try upgrading to that, this tutorial is quite old.
@srvinay @xinq2016
I encountered the same problem. When training the model, the parameter mb_size
(mini-batch size) defaults to 16, but during test, the prediction results will be different if mb_size
is modified to other values, such as 1, 8.
I thought that setting the value of mode
0 would solve the problem. Experiments show that this does not work.
mode: integer, 0, 1 or 2.
- 0: feature-wise normalization.
Each feature map in the input will
be normalized separately. The axis on which
to normalize is specified by the `axis` argument.
Note that if the input is a 4D image tensor
using Theano conventions (samples, channels, rows, cols)
then you should set `axis` to `1` to normalize along
the channels axis.
During training we use per-batch statistics to normalize
the data, and during testing we use running averages
computed during the training phase.
- 1: sample-wise normalization. This mode assumes a 2D input.
- 2: feature-wise normalization, like mode 0, but
using per-batch statistics to normalize the data during both
testing and training.
The version of keras used is 1.1.2
. If upgrade the keras to 2.0
, how do i modify the code? I would be very grateful if the code snippet can be given. Now I do not know which code in the project needs to be modified if keras is upgraded.
@xf4fresh Beside dropping mode on v2, have you tried setting learning phase to False druing test? https://github.com/baidu-research/ba-dls-deepspeech/blob/master/visualize.py#L40