WhisperS2T icon indicating copy to clipboard operation
WhisperS2T copied to clipboard

TensorRT - avg_logprob

Open OValery16 opened this issue 1 year ago • 4 comments

Thanks for your really impressive work.

I was wondering how to extract the token probability with TensorRT (a little bit lit what you did in this example with ctranslate2)

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2')

files = ['data/KINCAID46/audio/1.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=32)

print(out[0][0])
"""
[Console Output]

{'text': "Let's bring in Phil Mackie who is there at the palace. We're looking at Teresa and Philip May. Philip, can you see how he's being transferred from the helicopters? It looks like, as you said, the beast. It's got its headlights on because the sun is beginning to set now, certainly sinking behind some clouds. It's about a quarter of a mile away down the Grand Drive",
 'avg_logprob': -0.25426941679184695,
 'no_speech_prob': 8.147954940795898e-05,
 'start_time': 0.0,
 'end_time': 24.8}
"""

OValery16 avatar Feb 20 '24 13:02 OValery16

I'm inquiring about this because having access to this information (the scores) would enable us to implement a language detection method.

OValery16 avatar Feb 20 '24 21:02 OValery16

@OValery16 https://github.com/NVIDIA/TensorRT-LLM/issues/1127

shashikg avatar Feb 21 '24 17:02 shashikg

Thanks for your really impressive work.

I was wondering how to extract the token probability with TensorRT (a little bit lit what you did in this example with ctranslate2)

import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v2", backend='CTranslate2')

files = ['data/KINCAID46/audio/1.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = [None]

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                initial_prompts=initial_prompts,
                                batch_size=32)

print(out[0][0])
"""
[Console Output]

{'text': "Let's bring in Phil Mackie who is there at the palace. We're looking at Teresa and Philip May. Philip, can you see how he's being transferred from the helicopters? It looks like, as you said, the beast. It's got its headlights on because the sun is beginning to set now, certainly sinking behind some clouds. It's about a quarter of a mile away down the Grand Drive",
 'avg_logprob': -0.25426941679184695,
 'no_speech_prob': 8.147954940795898e-05,
 'start_time': 0.0,
 'end_time': 24.8}
"""

You may check this https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L2267. You may also need to register logit as one of the output tensors.

yuekaizhang avatar Feb 22 '24 05:02 yuekaizhang

Thanks for your tips. I don't really see how to do it without modifying TensorRT-llm library. Do you know how to do it ?

OValery16 avatar Mar 01 '24 10:03 OValery16