Some fragments recognize no output
I test a audio file here. Within the first 30 seconds, there is no conversion content output from 6 seconds to 30 seconds.
As shown in the figure above, there are audio content from 5.48 seconds to 30 seconds, but there is no output
The parameters I use
Segments, info = Model.transcrip ("test.mp3", beam_size = 5, best_of = 5, initial_prompt = "This is a Mandarin", support
Can you tell me where to start solving, thank you very much
The is testing audio files https://drive.google.com/file/d/1OYM3lTEqAtyGg7GfqwPFDo3T2e9NBmHM/view?usp=sharing
it's a singing audio, particularly the segment 5s → 30s is a vocal run
better use whisper with conversation / speech audio
it's a singing audio, particularly the segment 5s → 30s is a vocal run
better use whisper with conversation / speech audio
Thanks for your reply, I solved this problem after fine-tuning with openai/whisper's large model, so, is this the only solution?
it's a singing audio, particularly the segment 5s → 30s is a vocal run better use whisper with conversation / speech audio
Thanks for your reply, I solved this problem after fine-tuning with openai/whisper's large model, so, is this the only solution?
Can you share more details about fine-tuning, I am facing the same issue