WhisperS2T
WhisperS2T copied to clipboard
TensorRT - avg_logprob
Thanks for your really impressive work.
I was wondering how to extract the token probability with TensorRT (a little bit lit what you did in this example with ctranslate2)
import whisper_s2t
model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2')
files = ['data/KINCAID46/audio/1.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]
out = model.transcribe_with_vad(files,
lang_codes=lang_codes,
tasks=tasks,
initial_prompts=initial_prompts,
batch_size=32)
print(out[0][0])
"""
[Console Output]
{'text': "Let's bring in Phil Mackie who is there at the palace. We're looking at Teresa and Philip May. Philip, can you see how he's being transferred from the helicopters? It looks like, as you said, the beast. It's got its headlights on because the sun is beginning to set now, certainly sinking behind some clouds. It's about a quarter of a mile away down the Grand Drive",
'avg_logprob': -0.25426941679184695,
'no_speech_prob': 8.147954940795898e-05,
'start_time': 0.0,
'end_time': 24.8}
"""
I'm inquiring about this because having access to this information (the scores) would enable us to implement a language detection method.
@OValery16 https://github.com/NVIDIA/TensorRT-LLM/issues/1127
Thanks for your really impressive work.
I was wondering how to extract the token probability with TensorRT (a little bit lit what you did in this example with ctranslate2)
import whisper_s2t model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2') files = ['data/KINCAID46/audio/1.wav'] lang_codes = ['en'] tasks = ['transcribe'] initial_prompts = [None] out = model.transcribe_with_vad(files, lang_codes=lang_codes, tasks=tasks, initial_prompts=initial_prompts, batch_size=32) print(out[0][0]) """ [Console Output] {'text': "Let's bring in Phil Mackie who is there at the palace. We're looking at Teresa and Philip May. Philip, can you see how he's being transferred from the helicopters? It looks like, as you said, the beast. It's got its headlights on because the sun is beginning to set now, certainly sinking behind some clouds. It's about a quarter of a mile away down the Grand Drive", 'avg_logprob': -0.25426941679184695, 'no_speech_prob': 8.147954940795898e-05, 'start_time': 0.0, 'end_time': 24.8} """
You may check this https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L2267. You may also need to register logit as one of the output tensors.
Thanks for your tips. I don't really see how to do it without modifying TensorRT-llm library. Do you know how to do it ?