icefall
icefall copied to clipboard
[Streaming Conformer] Causal ConvolutionModule -> streaming ONNX/torch results mismatch
Hi,
I've tried to convert Conformer encoder for streaming purposes to ONNX using parts of sherpa's script https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/export_onnx.py. If I set the model causal=False the mean difference between torch and ONNX output is around -1e-8 (depending on the input ofc), but with causal=True it jumps to 0.001(and sometimes more) and that is way too big to be useful.
As I see the difference causal makes is:
- padding in
depthwise_conv= 0 - concat of
cachewith current input beforedepthwise_conv - setting
cacheto x[-self.lorder:]
So I can't really see what ONNX might not like. Maybe I'm missing something. Thanks in advance for help.
I suggest that you can export only some part of the model to onnx at a time, verify that the export works, and then export another part. Repeat it until the whole model is exported.
We have not tried to export a streaming conformer via onnx yet.
My bad, there is a recipe streaming_conformer_ctc so bad title I guess.
But what I mean is you have recipe librispeech/ASR/pruned_transducer_stateless2, from that I take only the conformer and try to convert it to ONNX for streaming using sherpa's script. I prepared small snippet to better visualize it.
As you can tell really the difference to sherpa is using not-zeros when converting the model.