icefall icon indicating copy to clipboard operation
icefall copied to clipboard

[Streaming Conformer] Causal ConvolutionModule -> streaming ONNX/torch results mismatch

Open MicKot opened this issue 3 years ago • 2 comments
trafficstars

Hi, I've tried to convert Conformer encoder for streaming purposes to ONNX using parts of sherpa's script https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/export_onnx.py. If I set the model causal=False the mean difference between torch and ONNX output is around -1e-8 (depending on the input ofc), but with causal=True it jumps to 0.001(and sometimes more) and that is way too big to be useful. As I see the difference causal makes is:

  • padding in depthwise_conv = 0
  • concat of cache with current input before depthwise_conv
  • setting cache to x[-self.lorder:]

So I can't really see what ONNX might not like. Maybe I'm missing something. Thanks in advance for help.

MicKot avatar Nov 04 '22 14:11 MicKot

I suggest that you can export only some part of the model to onnx at a time, verify that the export works, and then export another part. Repeat it until the whole model is exported.

We have not tried to export a streaming conformer via onnx yet.

csukuangfj avatar Nov 04 '22 15:11 csukuangfj

My bad, there is a recipe streaming_conformer_ctc so bad title I guess.

But what I mean is you have recipe librispeech/ASR/pruned_transducer_stateless2, from that I take only the conformer and try to convert it to ONNX for streaming using sherpa's script. I prepared small snippet to better visualize it. As you can tell really the difference to sherpa is using not-zeros when converting the model.

MicKot avatar Nov 04 '22 15:11 MicKot