Results 82 comments of Yuekai Zhang

Hi @Bhuvanesh09, great work! I will take this PR into internal gitlab to do some CI test. We will credit your work on the release notes for distill-whisper model.

This is really nice work! Many thanks to you @Eddie-Wang1120. I would import this into internal gitlab and hopefully it could be done this week.

> Thank you so much @symphonylyh for the guidelines! > > > Lastly, from a practical perspective, w/o BERT plugin path has a limitation on padding removal -- that is,...

> So my concern is not in whether we can run it or not. If we infer with `dynamic seq len` , what I observed is that whisper's decoder makes...

> @yuekaizhang I have less background on the Whisper discussion here, but do you mean the current `functional.py::conv2d()` cannot handle dynamic axes due to the `output.view(concat([output.size(1), output.size(2), output.size(3)]))` call? >...

Added a data point using A16 GPU. Batch_size 4, num_beam 1 | FP16 | Weight-only-quant int8 | | ------ | ------ | | 35 secs Decoding Time|33 secs Decoding Time|...

1. Could you try export with '--verbose' and attached detailed log? 2. Or I suggest to use latest tensorrt ( >= 8.6), and onnx opset e.g. 17, since it would...

`"Error Code 4: Internal Error (cnn_cache: for dimension number 3 in profile 0 does not match network definition (got min=7, opt=7, max=7), expected min=opt=max=14).)"` Looks like you are using conv...

Would you mind trying https://github.com/wenet-e2e/wenet/blob/3eb9a8579d65a32b606aa04c89bdfcaca10d220b/runtime/gpu/tensorrt/run_streaming_small_model.sh first? I am not sure if multi_cn_unified_conformer_exp.tar.gz works. Also, I suggest to use latest tensorrt without layernorm plugin. > Or I suggest to use latest...

> Hi @yuekaizhang, my guess is the zero padding in the collate function introduces the discrepancy. To verify it, maybe you can use batch_size 1 and use the same length...