TensorRT-LLM
TensorRT-LLM copied to clipboard
100% WER on distil-whisper/distil-large-v2
System Info
DGX V100 and DGX A100
Who can help?
@ncomly-nvidia to add more folks.
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Followed the whisper example. Got example engines working on A100 80GB and V100-16GB.
To save the HF model in bin format I did:
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, pipeline
import torch
from datasets import load_dataset, load_from_disk
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "distil-whisper/distil-large-v2"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, use_safetensors=False
)
model.save_pretrained('./distil-whisper/distil-large-v2', safe_serialization=False)
I had to download the mel_filters.npz and gpt2.tiktoken separately per the directions.
Example build and run cmds:
output_dir=distil_whisper_large_v2
python3 build.py --model_dir /workspace/models/whisper/assets/ --model_name distil-large-v2 --output_dir $output_dir --dtype float16 --enable_context_fmha --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin float16
python3 run.py --engine_dir $output_dir --dataset hf-internal-testing/librispeech_asr_dummy --name librispeech_dummy_output --tokenizer_name gpt2 --assets_dir /models/whisper/assets/ --dataset librispeech_asr --results_dir /models/whisper/results
Expected behavior
Not get >100% WER on librispeech_asr :)
actual behavior
in errs-librispeech.txt
%WER = 150.73 Errors: 28722 insertions, 3162 deletions, 50714 substitutions, over 54798 reference words (922 correct) Search below for sections starting with PER-UTT DETAILS:, SUBSTITUTIONS:, DELETIONS:, INSERTIONS:, PER-WORD STATS:
in rtf-librispeech.txt
RTF: 0.0098 total_duration: 19396.121 seconds (5.39 hours) processing time: 189.115 seconds (0.05 hours) batch size: 4 num_beams: 1
additional notes
n/a
Did you use the file first https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py ?
See https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper, you may need to convert huggingface checkpoint first.
@esnvidia
Yes, here's the exact steps I ran:
https://github.com/esnvidia/distil_whisper_hf2_triton
From: Yuekai Zhang @.> Sent: Tuesday, May 21, 2024 4:52:15 AM To: NVIDIA/TensorRT-LLM @.> Cc: Emanuel Scoullos @.>; Mention @.> Subject: Re: [NVIDIA/TensorRT-LLM] 100% WER on distil-whisper/distil-large-v2 (Issue #1620)
Did you use the file first https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py ?
See https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper, you may need to convert huggingface checkpoint first.
@esnvidiahttps://github.com/esnvidia
— Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA/TensorRT-LLM/issues/1620#issuecomment-2122109459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATIYP6OIAK77BZJP7QOKJWLZDMDLTAVCNFSM6AAAAABH3JAE3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGEYDSNBVHE. You are receiving this because you were mentioned.Message ID: @.***>
The test step:
python run.py --engine_dir $output_diry --name librispeech_dummy_output --tokenizer_name gpt2 --assets_dir ./assets/ --dataset librispeech_asr --results_dir ./results
Needs a little tweak to the cmd but should be simple for you to figure out.
From: Emanuel Scoullos @.> Sent: Tuesday, May 21, 2024 4:56:07 AM To: NVIDIA/TensorRT-LLM @.>; NVIDIA/TensorRT-LLM @.> Cc: Mention @.> Subject: Re: [NVIDIA/TensorRT-LLM] 100% WER on distil-whisper/distil-large-v2 (Issue #1620)
Yes, here's the exact steps I ran:
https://github.com/esnvidia/distil_whisper_hf2_triton
From: Yuekai Zhang @.> Sent: Tuesday, May 21, 2024 4:52:15 AM To: NVIDIA/TensorRT-LLM @.> Cc: Emanuel Scoullos @.>; Mention @.> Subject: Re: [NVIDIA/TensorRT-LLM] 100% WER on distil-whisper/distil-large-v2 (Issue #1620)
Did you use the file first https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py ?
See https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper, you may need to convert huggingface checkpoint first.
@esnvidiahttps://github.com/esnvidia
— Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA/TensorRT-LLM/issues/1620#issuecomment-2122109459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATIYP6OIAK77BZJP7QOKJWLZDMDLTAVCNFSM6AAAAABH3JAE3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGEYDSNBVHE. You are receiving this because you were mentioned.Message ID: @.***>
Oh, I see for distill-large-v2, you should use the default multilingual tokenizer rather than gpt2. @esnvidia
Yes, here's the exact steps I ran: https://github.com/esnvidia/distil_whisper_hf2_triton
Also, you are welcome to contribute this triton model_repo for distil whisper to sherpa/triton/whisper if you have some free time.
@yuekaizhang Are you sure it's multilingual? The step in the example shows gpt2:
here is the cmd
python3 run.py --engine_dir $output_dir --dataset hf-internal-testing/librispeech_asr_dummy --name librispeech_dummy_${output_dir} --tokenizer_name gpt2
as well as this step:
# download the gpt2.tiktoken
wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/gpt2.tiktoken
@yuekaizhang confirmed the need for mulitlingual. This needs to be updated in the docs.
@yuekaizhang confirmed the need for mulitlingual. This needs to be updated in the docs.
Updated it. Now users don't need to specify tokenizer_name by themselves.
Awesome, but I still don't see the change reflected in the main branch. I'm looking here: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper
Is there a PR tied to this?
Also getting 100% WER using the Triton-ASR-Client by the way. Let me know if you want me to file an issue there. I think it simply involves copying the functions from the run.py here since I was able to get the 3% WER with that.
I can contribute to sherpa etc once this works E2E. :)
Is there a PR tied to this?
Yes. I have updated in the gitlab. It will sync to github several days later.
Also getting 100% WER using the Triton-ASR-Client by the way. Let me know if you want me to file an issue there. I think it simply involves copying the functions from the run.py here since I was able to get the 3% WER with that.
https://github.com/k2-fsa/sherpa/tree/master/triton/whisper#benchmark-using-dataset Could you try --whisper-prompt "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>" . If it can't work, you may file a issue under sherpa, and attach more details. I will investigate at there.
I can contribute to sherpa etc once this works E2E. :)
That sounds great. @esnvidia
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."