Yuekai Zhang
Yuekai Zhang
> Updates here. I found some [discussions ](https://github.com/openai/whisper/discussions/830)about convert whisper to huggingface file, including changing layer name. If I proceed vice versa to the work in the link above, could...
> @yuekaizhang Yeah I have seen that page but I’m not sure the way it convert huggingface to openai works well even if the model is not distil-whisper (because of...
> Hi @yuekaizhang, > > I've successfully compiled hf-whisper to tensorrt-llm and am currently looking to deploy the model using Triton. However, I'm encountering some confusion regarding the expected I/O...
> @yuekaizhang > > Thank you for sharing and quick reply! > > I reviewed the link you mentioned and have some questions regarding the implementation: > > 1. In...
Did you use the file first https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py ? See https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper, you may need to convert huggingface checkpoint first. @esnvidia
Oh, I see for distill-large-v2, you should use the default multilingual tokenizer rather than gpt2. @esnvidia
> Yes, here's the exact steps I ran: https://github.com/esnvidia/distil_whisper_hf2_triton Also, you are welcome to contribute this triton model_repo for distil whisper to sherpa/triton/whisper if you have some free time.
> @yuekaizhang confirmed the need for mulitlingual. This needs to be updated in the docs. Updated it. Now users don't need to specify tokenizer_name by themselves.
> Is there a PR tied to this? Yes. I have updated in the gitlab. It will sync to github several days later. > Also getting 100% WER using the...
> I think this issue may be related to GPU architecture. Is there another method for using quantization with the Whisper model? Yes, WOQ int8 cannot significantly improve inference speed,...