icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Use padding_idx=None for nn.Embedding() in the decoder model

Open csukuangfj opened this issue 1 year ago • 2 comments

We need to change sherpa/sherpa-onnx/sherpa-ncnn to use [-1, 0] as the initial tokens during decoding instead of [0, 0]. The consequence is that if we need to re-export the model, otherwise it will cause runtime error saying that

IndexError: index out of range in self

If we don't change sherpa/sherpa-onnx/sherpa-ncnn, a user has reported that the WER becomes worse.

csukuangfj avatar Aug 08 '23 13:08 csukuangfj

I don't understand that we train the model use [0,0], why we decode by [-1,0]?

        blank_id = self.decoder.blank_id
        sos_y = add_sos(y, sos_id=blank_id)

        # sos_y_padded: [B, S + 1], start with SOS.
        sos_y_padded = sos_y.pad(mode="constant", padding_value=blank_id)

https://github.com/k2-fsa/icefall/blob/10a234709cb6b8aa5e99b1c18140b49db7b6faca/egs/librispeech/ASR/zipformer/model.py#L208

kamirdin avatar Dec 17 '23 02:12 kamirdin

I don't understand that we train the model use [0,0], why we decode by [-1,0]?

        blank_id = self.decoder.blank_id
        sos_y = add_sos(y, sos_id=blank_id)

        # sos_y_padded: [B, S + 1], start with SOS.
        sos_y_padded = sos_y.pad(mode="constant", padding_value=blank_id)

https://github.com/k2-fsa/icefall/blob/10a234709cb6b8aa5e99b1c18140b49db7b6faca/egs/librispeech/ASR/zipformer/model.py#L208

please think about the input of the conv module in the decoder model.

csukuangfj avatar Dec 17 '23 04:12 csukuangfj