CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

finetuned whisper model can not use “initial_prompt”

Open v-yunbin opened this issue 7 months ago • 1 comments

  1. initial_prompt I use convert offical whisper model to CTranslate2 format,I can use “initial_prompt” normally. I convert my finetuned whisper model to CTranslate2 format, when i use “initial_prompt”, I get a strange result or empty result. decoding the same audio with my finetuned model, The WhisperGenerationResult of using initial_prompt and not initial_prompt is: 1) using initial_prompt: WhisperGenerationResult(sequences=[['<|0.00|>', '<|zh|>', '<|2.06|>', '<|zh|>', '<|4.06|>', '<|en|>', '好çļĦ', 'æĤ¨', '好', 'ï', '¼', 'Į', 'æĤ¨', '好', 'ï', '¼', 'Į', '请', 'éĹ®', 'ä¸Ģä¸ĭ', 'æĥħåĨµ', 'æĦŁ', 'è°¢', 'èĢ', 'IJ', 'å¿ĥ', 'ï', '¼', 'Į', 'èĢ', 'IJ', 'å¿ĥ', 'åĨį', '次', 'çŃī', 'å¾ħ', 'å¼ł', 'åħĪçĶŁ', '缮åīį', 'éĢļ', 'è¿ĩ', 'æŁ¥', 'çľĭ', 'åij¢', 'ï', '¼', 'Į', 'åĴ', '±', '们', 'æĺ¯', '没æľī', 'åĿ', '¦', 'åħĭ', 'ä¸ī', 'çĻ¾', 'P', 'TV', 'è½', '¦', 'é¡', '¶', 'çļĦ', 'ä¸Ģ个', 'ä¸Ĭ', 'å¸Ĥ', 'ä¿¡', 'æģ¯', 'çļĦ', '建', 'è®', '®', 'ï', '¼', 'Į', 'åĴ', '±', '们', 'æĮģ', 'ç»', 'Ń', 'åħ³', '注', 'çļĦ', 'ãĢĤ', '<|10.28|>']], sequences_ids=[[50364, 50260, 50467, 50260, 50567, 50259, 20715, 23414, 2131, 171, 120, 234, 23414, 2131, 171, 120, 234, 27908, 22064, 8861, 46514, 9709, 11340, 4450, 238, 7945, 171, 120, 234, 4450, 238, 7945, 8623, 9487, 10187, 18390, 44059, 33083, 39004, 19550, 16866, 42623, 4200, 6240, 171, 120, 234, 8975, 109, 9497, 1541, 17944, 14872, 99, 24881, 10960, 31906, 47, 12586, 17819, 99, 10178, 114, 1546, 20182, 5708, 27261, 17665, 26460, 1546, 34157, 7422, 106, 171, 120, 234, 8975, 109, 9497, 17694, 10115, 255, 28053, 26432, 1546, 1543, 50878]], scores=[-0.5639367699623108], no_speech_prob=0.0) 2) using initial_prompt: WhisperGenerationResult(sequences=[['<|0.00|>', '<|zh|>', '<|0.26|>', '<|zh|>', '<|2.06|>', '<|zh|>', '<|2.20|>', '<|zh|>', '<|3.08|>', '<|zh|>', '<|3.76|>', '<|zh|>', '<|4.06|>']], sequences_ids=[[50364, 50260, 50377, 50260, 50467, 50260, 50474, 50260, 50518, 50260, 50552, 50260, 50567]], scores=[-2.519230842590332], no_speech_prob=0.0)

  2. cpu decoding

    I convert offical whisper model to CTranslate2 format, when I decode on cpu, it's normal. I convert my finetuned whisper model to CTranslate2 format, when I decode on cpu, I get a empty result

v-yunbin avatar Dec 06 '23 02:12 v-yunbin

This is because whisper was originally also trained on examples where you feed initial prompt too If this was absent in your dataset then this feature will die

This is the problem of your fine-tune and will be present independedly of your conversation to ct2

NeonBohdan avatar Jan 12 '24 17:01 NeonBohdan