Yuekai Zhang comments

Results 82 comments of


                                            Yuekai Zhang

Adding distil-whisper model support to TensorRT-LLM

Hi @Bhuvanesh09, great work! I will take this PR into internal gitlab to do some CI test. We will credit your work on the release notes for distill-whisper model.

Fix enc_dec bug and Make several improvements to whisper

This is really nice work! Many thanks to you @Eddie-Wang1120. I would import this into internal gitlab and hopefully it could be done this week.

Fix enc_dec bug and Make several improvements to whisper

> Thank you so much @symphonylyh for the guidelines! > > > Lastly, from a practical perspective, w/o BERT plugin path has a limitation on padding removal -- that is,...

Fix enc_dec bug and Make several improvements to whisper

> So my concern is not in whether we can run it or not. If we infer with `dynamic seq len` , what I observed is that whisper's decoder makes...

Fix enc_dec bug and Make several improvements to whisper

> @yuekaizhang I have less background on the Whisper discussion here, but do you mean the current `functional.py::conv2d()` cannot handle dynamic axes due to the `output.view(concat([output.size(1), output.size(2), output.size(3)]))` call? >...

Fix enc_dec bug and Make several improvements to whisper

Added a data point using A16 GPU. Batch_size 4, num_beam 1 | FP16 | Weight-only-quant int8 | | ------ | ------ | | 35 secs Decoding Time|33 secs Decoding Time|...

Failed to 'convert conformer encoder with layernorm plugin'

1. Could you try export with '--verbose' and attached detailed log? 2. Or I suggest to use latest tensorrt ( >= 8.6), and onnx opset e.g. 17, since it would...

Failed to 'convert conformer encoder with layernorm plugin'

`"Error Code 4: Internal Error (cnn_cache: for dimension number 3 in profile 0 does not match network definition (got min=7, opt=7, max=7), expected min=opt=max=14).)"` Looks like you are using conv...

Failed to 'convert conformer encoder with layernorm plugin'

Would you mind trying https://github.com/wenet-e2e/wenet/blob/3eb9a8579d65a32b606aa04c89bdfcaca10d220b/runtime/gpu/tensorrt/run_streaming_small_model.sh first? I am not sure if multi_cn_unified_conformer_exp.tar.gz works. Also, I suggest to use latest tensorrt without layernorm plugin. > Or I suggest to use latest...

Wav2vec2.0 Pretrained model gives different emission results for different batch size input.

> Hi @yuekaizhang, my guess is the zero padding in the collate function introduces the discrepancy. To verify it, maybe you can use batch_size 1 and use the same length...