faster-whisper Some fragments recognize no output

I test a audio file here. Within the first 30 seconds, there is no conversion content output from 6 seconds to 30 seconds. As shown in the figure above, there are audio content from 5.48 seconds to 30 seconds, but there is no output The parameters I use Segments, info = Model.transcrip ("test.mp3", beam_size = 5, best_of = 5, initial_prompt = "This is a Mandarin", support Can you tell me where to start solving, thank you very much

The is testing audio files https://drive.google.com/file/d/1OYM3lTEqAtyGg7GfqwPFDo3T2e9NBmHM/view?usp=sharing

Aug 21 '23 08:08 wdxiaobai

it's a singing audio, particularly the segment 5s → 30s is a vocal run

better use whisper with conversation / speech audio

Aug 21 '23 20:08 phineas-pta

it's a singing audio, particularly the segment 5s → 30s is a vocal run

better use whisper with conversation / speech audio

Thanks for your reply, I solved this problem after fine-tuning with openai/whisper's large model, so, is this the only solution?

Aug 22 '23 09:08 wdxiaobai

it's a singing audio, particularly the segment 5s → 30s is a vocal run better use whisper with conversation / speech audio

Thanks for your reply, I solved this problem after fine-tuning with openai/whisper's large model, so, is this the only solution?

Can you share more details about fine-tuning, I am facing the same issue

Jul 15 '25 08:07 hebrd