whisper.cpp
whisper.cpp copied to clipboard
unexpected output with word level timestamps & dual channel (stereo)
Hi, thank you for recently adding stereo (dual channel) support. It works great with utterance level timestamps but when i use:
-ml 1
to do word level timestamps, I get some unexpected output.
Please see the attached gist for output
Here is the wav file i used (i used ffmpeg to convert it to 16k ffmpeg -i commercial_stereo.wav -ar 16000 commercial_stereo_16k.wav
)
Hi, can you clarify which part is unexpected?
Is it that the diarization is not as good as if you weren't using -ml 1
? It can definitely be improved, I just didn't have any data to test with. I think using this file, I can try to make it a bit better.
Sorry, i forgot to include that detail. If you look at lines 62,63 96,97
The speaker is not identified.