CTranslate2 finetuned whisper model can not use “initial

finetuned whisper model can not use “initial_prompt”

Open ben-8878 opened this issue 1 year ago • 1 comments

initial_prompt I use convert offical whisper model to CTranslate2 format，I can use “initial_prompt” normally. I convert my finetuned whisper model to CTranslate2 format, when i use “initial_prompt”, I get a strange result or empty result. decoding the same audio with my finetuned model, The WhisperGenerationResult of using initial_prompt and not initial_prompt is： 1） using initial_prompt： WhisperGenerationResult(sequences=[['<|0.00|>', '<|zh|>', '<|2.06|>', '<|zh|>', '<|4.06|>', '<|en|>', 'å¥½çļĦ', 'æĤ¨', 'å¥½', 'ï', '¼', 'Į', 'æĤ¨', 'å¥½', 'ï', '¼', 'Į', 'è¯·', 'éĹ®', 'ä¸Ģä¸ĭ', 'æĥħåĨµ', 'æĦŁ', 'è°¢', 'èĢ', 'Ĳ', 'å¿ĥ', 'ï', '¼', 'Į', 'èĢ', 'Ĳ', 'å¿ĥ', 'åĨį', 'æ¬¡', 'çŃī', 'å¾ħ', 'å¼ł', 'åħĪçĶŁ', 'çĽ®åīį', 'éĢļ', 'è¿ĩ', 'æŁ¥', 'çľĭ', 'åĳ¢', 'ï', '¼', 'Į', 'åĴ', '±', 'ä»¬', 'æĺ¯', 'æ²¡æľī', 'åĿ', '¦', 'åħĭ', 'ä¸ī', 'çĻ¾', 'P', 'TV', 'è½', '¦', 'é¡', '¶', 'çļĦ', 'ä¸Ģä¸ª', 'ä¸Ĭ', 'å¸Ĥ', 'ä¿¡', 'æģ¯', 'çļĦ', 'å»º', 'è®', '®', 'ï', '¼', 'Į', 'åĴ', '±', 'ä»¬', 'æĮģ', 'ç»', 'Ń', 'åħ³', 'æ³¨', 'çļĦ', 'ãĢĤ', '<|10.28|>']], sequences_ids=[[50364, 50260, 50467, 50260, 50567, 50259, 20715, 23414, 2131, 171, 120, 234, 23414, 2131, 171, 120, 234, 27908, 22064, 8861, 46514, 9709, 11340, 4450, 238, 7945, 171, 120, 234, 4450, 238, 7945, 8623, 9487, 10187, 18390, 44059, 33083, 39004, 19550, 16866, 42623, 4200, 6240, 171, 120, 234, 8975, 109, 9497, 1541, 17944, 14872, 99, 24881, 10960, 31906, 47, 12586, 17819, 99, 10178, 114, 1546, 20182, 5708, 27261, 17665, 26460, 1546, 34157, 7422, 106, 171, 120, 234, 8975, 109, 9497, 17694, 10115, 255, 28053, 26432, 1546, 1543, 50878]], scores=[-0.5639367699623108], no_speech_prob=0.0) 2） using initial_prompt： WhisperGenerationResult(sequences=[['<|0.00|>', '<|zh|>', '<|0.26|>', '<|zh|>', '<|2.06|>', '<|zh|>', '<|2.20|>', '<|zh|>', '<|3.08|>', '<|zh|>', '<|3.76|>', '<|zh|>', '<|4.06|>']], sequences_ids=[[50364, 50260, 50377, 50260, 50467, 50260, 50474, 50260, 50518, 50260, 50552, 50260, 50567]], scores=[-2.519230842590332], no_speech_prob=0.0)
cpu decoding

I convert offical whisper model to CTranslate2 format, when I decode on cpu, it's normal. I convert my finetuned whisper model to CTranslate2 format, when I decode on cpu, I get a empty result

Dec 06 '23 02:12 ben-8878

This is because whisper was originally also trained on examples where you feed initial prompt too If this was absent in your dataset then this feature will die

This is the problem of your fine-tune and will be present independedly of your conversation to ct2

Jan 12 '24 17:01 NeonBohdan

CTranslate2 CTranslate2 copied to clipboard

finetuned whisper model can not use “initial_prompt”

CTranslate2
CTranslate2 copied to clipboard