icefall Output not matching after exporting updated Zipformer model to Onnx

Hi, I have trained latest streaming zipformer model with custom dataset and exported the model to onnx. When I compare the output from original pth model and the onnx model, a accuracy gap of 5% is found in the exported onnx model.

Jun 29 '23 09:06 bhaswa

a accuracy gap of 5% is found in the exported onnx model

Could you identify the wave files that cause inconsistent recognition results?

If yes, could you use one of them to compute the encoder output and compare whether the encoder output is the same for icefall and sherpa-onnx?

Jun 29 '23 09:06 csukuangfj

Btw, I calculated the accuracy of onnx model using ./zipformer/onnx_pretrained-streaming.py, not sherpa-onnx.

Jun 29 '23 09:06 bhaswa

Btw, I calculated the accuracy of onnx model using ./zipformer/onnx_pretrained-streaming.py, not sherpa-onnx.

That is also ok. It is much easier to get the encoder output with /zipformer/onnx_pretrained-streaming.py.

Jun 29 '23 10:06 csukuangfj

@csukuangfj output from the encoder layer is not matching. I checked it for two audios, for one audio recognition result is same and in other audio recognition result is different. Both the cases encoder output is not matching.

Jul 04 '23 05:07 bhaswa

@csukuangfj Any update on this ?

Jul 07 '23 06:07 bhaswa

output from the encoder layer is not matching

How large is the difference? If the input is the same, the encoder output should also be the same within some numeric tolerance.

Jul 07 '23 07:07 csukuangfj

I double checked the output. Outputs are completely different from the encoder layer for . Infact the dimensions are not matching.

Dimension for pth: 1 x 16 x 256 Dimension for onnx: 1 x 16 x 512

Jul 07 '23 10:07 bhaswa

I double checked the output. Outputs are completely different from the encoder layer for . Infact the dimensions are not matching.

Dimension for pth: 1 x 16 x 256

Dimension for onnx: 1 x 16 x 512

Please apply joiner.ecoder_proj layer to the one whose dim is 512.

The ONNX version invokes joiner.ecoder_proj automatically.

Jul 07 '23 11:07 csukuangfj

I double checked the output. Outputs are completely different from the encoder layer for . Infact the dimensions are not matching.

Dimension for pth: 1 x 16 x 256

Dimension for onnx: 1 x 16 x 512

Please apply joiner.ecoder_proj layer to the output of PyTorch.

The ONNX version invokes joiner.ecoder_proj automatically.

Jul 07 '23 11:07 csukuangfj

After applying the joiner.ecoder_proj layer after encoder layer, now dimension is matching, but values are still different.

Jul 07 '23 13:07 bhaswa

but values are still different.

How large is the difference? You can use (a - b).abs().max() to get the max difference.

Jul 07 '23 13:07 csukuangfj

the number of times encoder is called in pth inference is different from onnx inference. all streaming codes are use FYI.
for a 0.5 sec audio pth calls encoder 2 times whereas onnx it is called only 1 time.

Jul 11 '23 06:07 sanjuktasr

icefall icefall copied to clipboard

Output not matching after exporting updated Zipformer model to Onnx

icefall
icefall copied to clipboard