RealtimeSTT
RealtimeSTT copied to clipboard
The accuracy issue of real-time Speech-to-Text (STT) transcription
The text data returned by this callback recorder.text(process_text) often contains repeated content or accumulates with a delay. Are there recommended reference values for the recorder_config parameter?Thanks. recorder = AudioToTextRecorder(**recorder_config)
This sounds more like the behaviour of the on_realtime_transcription_update callback. Definitely should not occur with the default parameter set. My first guess would be you are maybe using the same callbacks for both the on_transcription_finished callback from the text method and the on_realtime_transcription_update callback from the AudioToTextRecorder constructor.
Some updates on this. Former faster-whisper version prob caused this (got somehow corrupted on pypi), I think it was 0.6.0. Neuer versions are fine.
Thanks! Should I use the 0.6.0 version of the faster-whisper instead of the latetest [v1.0.1]?(https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.0.1) Or just update the latest faster-whisper / RealtimeSTT version?
You can upgrade RealtimeSTT to newest version which uses latest faster-whisper 1.0.1 (this version is also referenced in the requirements file of RealtimeSTT) .
great! Another question is the latest v0.1.15 of RealtimeSTT has the parameter beam_size, it can be use to reduce the delay?
You trade-off accuracy vs speed: A larger beam_size yields better quality output because the model can explore more options and potentially avoid local minima in the search space. But also means slower performance because more sequences are evaluated at each step.