CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

finetuned whisper model can not use “initial_prompt”

Open ben-8878 opened this issue 1 year ago • 1 comments

  1. initial_prompt I use convert offical whisper model to CTranslate2 format,I can use “initial_prompt” normally. I convert my finetuned whisper model to CTranslate2 format, when i use “initial_prompt”, I get a strange result or empty result. decoding the same audio with my finetuned model, The WhisperGenerationResult of using initial_prompt and not initial_prompt is: 1) using initial_prompt: WhisperGenerationResult(sequences=[['<|0.00|>', '<|zh|>', '<|2.06|>', '<|zh|>', '<|4.06|>', '<|en|>', '好çļĦ', 'æĤ¨', '好', 'ï', '¼', 'Į', 'æĤ¨', '好', 'ï', '¼', 'Į', '请', 'éĹ®', 'ä¸Ģä¸ĭ', 'æĥħåĨµ', 'æĦŁ', 'è°¢', 'èĢ', 'IJ', 'å¿ĥ', 'ï', '¼', 'Į', 'èĢ', 'IJ', 'å¿ĥ', 'åĨį', '次', 'çŃī', 'å¾ħ', 'å¼ł', 'åħĪçĶŁ', '缮åīį', 'éĢļ', 'è¿ĩ', 'æŁ¥', 'çľĭ', 'åij¢', 'ï', '¼', 'Į', 'åĴ', '±', '们', 'æĺ¯', '没æľī', 'åĿ', '¦', 'åħĭ', 'ä¸ī', 'çĻ¾', 'P', 'TV', 'è½', '¦', 'é¡', '¶', 'çļĦ', 'ä¸Ģ个', 'ä¸Ĭ', 'å¸Ĥ', 'ä¿¡', 'æģ¯', 'çļĦ', '建', 'è®', '®', 'ï', '¼', 'Į', 'åĴ', '±', '们', 'æĮģ', 'ç»', 'Ń', 'åħ³', '注', 'çļĦ', 'ãĢĤ', '<|10.28|>']], sequences_ids=[[50364, 50260, 50467, 50260, 50567, 50259, 20715, 23414, 2131, 171, 120, 234, 23414, 2131, 171, 120, 234, 27908, 22064, 8861, 46514, 9709, 11340, 4450, 238, 7945, 171, 120, 234, 4450, 238, 7945, 8623, 9487, 10187, 18390, 44059, 33083, 39004, 19550, 16866, 42623, 4200, 6240, 171, 120, 234, 8975, 109, 9497, 1541, 17944, 14872, 99, 24881, 10960, 31906, 47, 12586, 17819, 99, 10178, 114, 1546, 20182, 5708, 27261, 17665, 26460, 1546, 34157, 7422, 106, 171, 120, 234, 8975, 109, 9497, 17694, 10115, 255, 28053, 26432, 1546, 1543, 50878]], scores=[-0.5639367699623108], no_speech_prob=0.0) 2) using initial_prompt: WhisperGenerationResult(sequences=[['<|0.00|>', '<|zh|>', '<|0.26|>', '<|zh|>', '<|2.06|>', '<|zh|>', '<|2.20|>', '<|zh|>', '<|3.08|>', '<|zh|>', '<|3.76|>', '<|zh|>', '<|4.06|>']], sequences_ids=[[50364, 50260, 50377, 50260, 50467, 50260, 50474, 50260, 50518, 50260, 50552, 50260, 50567]], scores=[-2.519230842590332], no_speech_prob=0.0)

  2. cpu decoding

    I convert offical whisper model to CTranslate2 format, when I decode on cpu, it's normal. I convert my finetuned whisper model to CTranslate2 format, when I decode on cpu, I get a empty result

ben-8878 avatar Dec 06 '23 02:12 ben-8878

This is because whisper was originally also trained on examples where you feed initial prompt too If this was absent in your dataset then this feature will die

This is the problem of your fine-tune and will be present independedly of your conversation to ct2

NeonBohdan avatar Jan 12 '24 17:01 NeonBohdan