CTranslate2
CTranslate2 copied to clipboard
finetuned whisper model can not use “initial_prompt”
-
initial_prompt I use convert offical whisper model to CTranslate2 format,I can use “initial_prompt” normally. I convert my finetuned whisper model to CTranslate2 format, when i use “initial_prompt”, I get a strange result or empty result. decoding the same audio with my finetuned model, The WhisperGenerationResult of using initial_prompt and not initial_prompt is: 1) using initial_prompt:
WhisperGenerationResult(sequences=[['<|0.00|>', '<|zh|>', '<|2.06|>', '<|zh|>', '<|4.06|>', '<|en|>', '好çļĦ', 'æĤ¨', '好', 'ï', '¼', 'Į', 'æĤ¨', '好', 'ï', '¼', 'Į', '请', 'éĹ®', 'ä¸Ģä¸ĭ', 'æĥħåĨµ', 'æĦŁ', 'è°¢', 'èĢ', 'IJ', 'å¿ĥ', 'ï', '¼', 'Į', 'èĢ', 'IJ', 'å¿ĥ', 'åĨį', '次', 'çŃī', 'å¾ħ', 'å¼ł', 'åħĪçĶŁ', '缮åīį', 'éĢļ', 'è¿ĩ', 'æŁ¥', 'çľĭ', 'åij¢', 'ï', '¼', 'Į', 'åĴ', '±', '们', 'æĺ¯', '没æľī', 'åĿ', '¦', 'åħĭ', 'ä¸ī', 'çĻ¾', 'P', 'TV', 'è½', '¦', 'é¡', '¶', 'çļĦ', 'ä¸Ģ个', 'ä¸Ĭ', 'å¸Ĥ', 'ä¿¡', 'æģ¯', 'çļĦ', '建', 'è®', '®', 'ï', '¼', 'Į', 'åĴ', '±', '们', 'æĮģ', 'ç»', 'Ń', 'åħ³', '注', 'çļĦ', 'ãĢĤ', '<|10.28|>']], sequences_ids=[[50364, 50260, 50467, 50260, 50567, 50259, 20715, 23414, 2131, 171, 120, 234, 23414, 2131, 171, 120, 234, 27908, 22064, 8861, 46514, 9709, 11340, 4450, 238, 7945, 171, 120, 234, 4450, 238, 7945, 8623, 9487, 10187, 18390, 44059, 33083, 39004, 19550, 16866, 42623, 4200, 6240, 171, 120, 234, 8975, 109, 9497, 1541, 17944, 14872, 99, 24881, 10960, 31906, 47, 12586, 17819, 99, 10178, 114, 1546, 20182, 5708, 27261, 17665, 26460, 1546, 34157, 7422, 106, 171, 120, 234, 8975, 109, 9497, 17694, 10115, 255, 28053, 26432, 1546, 1543, 50878]], scores=[-0.5639367699623108], no_speech_prob=0.0)
2) using initial_prompt:WhisperGenerationResult(sequences=[['<|0.00|>', '<|zh|>', '<|0.26|>', '<|zh|>', '<|2.06|>', '<|zh|>', '<|2.20|>', '<|zh|>', '<|3.08|>', '<|zh|>', '<|3.76|>', '<|zh|>', '<|4.06|>']], sequences_ids=[[50364, 50260, 50377, 50260, 50467, 50260, 50474, 50260, 50518, 50260, 50552, 50260, 50567]], scores=[-2.519230842590332], no_speech_prob=0.0)
-
cpu decoding
I convert offical whisper model to CTranslate2 format, when I decode on cpu, it's normal. I convert my finetuned whisper model to CTranslate2 format, when I decode on cpu, I get a empty result
This is because whisper was originally also trained on examples where you feed initial prompt too If this was absent in your dataset then this feature will die
This is the problem of your fine-tune and will be present independedly of your conversation to ct2