Yuekai Zhang comments

Results 130 comments of


                                            Yuekai Zhang

Build with huggingface-whisper

> Updates here. I found some [discussions ](https://github.com/openai/whisper/discussions/830)about convert whisper to huggingface file, including changing layer name. If I proceed vice versa to the work in the link above, could...

Build with huggingface-whisper

> @yuekaizhang Yeah I have seen that page but I’m not sure the way it convert huggingface to openai works well even if the model is not distil-whisper (because of...

Build with huggingface-whisper

> Hi @yuekaizhang, > > I've successfully compiled hf-whisper to tensorrt-llm and am currently looking to deploy the model using Triton. However, I'm encountering some confusion regarding the expected I/O...

Build with huggingface-whisper

> @yuekaizhang > > Thank you for sharing and quick reply! > > I reviewed the link you mentioned and have some questions regarding the implementation: > > 1. In...

100% WER on distil-whisper/distil-large-v2

Did you use the file first https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py ? See https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper, you may need to convert huggingface checkpoint first. @esnvidia

100% WER on distil-whisper/distil-large-v2

Oh, I see for distill-large-v2, you should use the default multilingual tokenizer rather than gpt2. @esnvidia

100% WER on distil-whisper/distil-large-v2

> Yes, here's the exact steps I ran: https://github.com/esnvidia/distil_whisper_hf2_triton Also, you are welcome to contribute this triton model_repo for distil whisper to sherpa/triton/whisper if you have some free time.

100% WER on distil-whisper/distil-large-v2

> @yuekaizhang confirmed the need for mulitlingual. This needs to be updated in the docs. Updated it. Now users don't need to specify tokenizer_name by themselves.

100% WER on distil-whisper/distil-large-v2

> Is there a PR tied to this? Yes. I have updated in the gitlab. It will sync to github several days later. > Also getting 100% WER using the...

WOQ is not giving any performance speedup in whisper

> I think this issue may be related to GPU architecture. Is there another method for using quantization with the Whisper model? Yes, WOQ int8 cannot significantly improve inference speed,...