icefall
icefall copied to clipboard
Use padding_idx=None for nn.Embedding() in the decoder model
We need to change sherpa/sherpa-onnx/sherpa-ncnn to use [-1, 0]
as the initial tokens during decoding instead of [0, 0]
.
The consequence is that if we need to re-export the model, otherwise it will cause runtime error saying that
IndexError: index out of range in self
If we don't change sherpa/sherpa-onnx/sherpa-ncnn, a user has reported that the WER becomes worse.
I don't understand that we train the model use [0,0], why we decode by [-1,0]?
blank_id = self.decoder.blank_id
sos_y = add_sos(y, sos_id=blank_id)
# sos_y_padded: [B, S + 1], start with SOS.
sos_y_padded = sos_y.pad(mode="constant", padding_value=blank_id)
https://github.com/k2-fsa/icefall/blob/10a234709cb6b8aa5e99b1c18140b49db7b6faca/egs/librispeech/ASR/zipformer/model.py#L208
I don't understand that we train the model use [0,0], why we decode by [-1,0]?
blank_id = self.decoder.blank_id sos_y = add_sos(y, sos_id=blank_id) # sos_y_padded: [B, S + 1], start with SOS. sos_y_padded = sos_y.pad(mode="constant", padding_value=blank_id)
https://github.com/k2-fsa/icefall/blob/10a234709cb6b8aa5e99b1c18140b49db7b6faca/egs/librispeech/ASR/zipformer/model.py#L208
please think about the input of the conv module in the decoder model.