vosk-browser icon indicating copy to clipboard operation
vosk-browser copied to clipboard

Two problems when using vosk-browser with non-streaming, separated static waveforms

Open lheine10 opened this issue 1 year ago • 2 comments

Hi,

I'm trying to use vosk-browser with several static 2-10sec waveforms.

I'm starting the recognizer with acceptWaveformFloat().

There are two problems:

There is no way to reset the start/end times of the words.

The values get larger with every new (independent) waveform.

The trigger of the onResult handler is unreliable.

It seems to be triggered automatically if the waveform has enough silence at the end.

If that's not the case the onResult handler isn't triggered at all.

I can force an onResult trigger with retrieveFinalResult() but this will cause buggy double trigger of the onResult handler for waveforms with silence at the end.

lheine10 avatar Mar 01 '23 15:03 lheine10

@lheine10, another workaround - admittedly, not ideal...

You can pass silence samples to acceptWaveformFloat() to induce the final result.

I'm not a maintainer/official project person, so don't interpret the suggestion as "we won't fix it".

erikh2000 avatar May 14 '23 19:05 erikh2000

Example of forcing a result:

// kaldiSampleRate = whatever kaldiRecognizer was constructed with.
const silenceSamples = createSilenceSamples(kaldiSampleRate, 2000);
kaldiRecognizer.acceptWaveformFloat(silenceSamples, kaldiSampleRate;

Code for createSilenceSamples(): https://github.com/erikh2000/sl-web-audio/blob/main/src/generating/silenceUtil.ts

erikh2000 avatar May 14 '23 19:05 erikh2000