vosk-api Vosk Versus Pico Voice - A quick comparison.

Vosk Vs Pico Voice (leopard)

Models Used: Vosk Model: En-US 0.22 Pico Voice Model: En-Inbuilt (needs access-key which can be obtained by logging in to Pico Voice Console).

Procedure: I wished to compare Vosk with another highly boasted ASR project called Picovoice (leopard) but this time I wanted to use a simple file with less audio complexity as compared to my earlier comparison of #892. Plus video is of short duration... i.e. trailer 5 of Batman 2022 containing better stereo audio in PCM format (2300kb/s at 48 KHZ).

Procedure was same as in #892 except spleeter was not used ....& audio file was simple (with no US slangs, bad words etc.) .......but it definitely consists of low & different voice pitches.

Results: Pico unprocessed

WER: 62.162% ( 161 / 259) WRR: 39.382% ( 102 / 259)

Pico processed

WER: 61.776% ( 160 / 259) WRR: 40.154% ( 104 / 259)

Vosk unprocessed

WER: 111.446% ( 185 / 166) WRR: 4.819% ( 8 / 166)

Vosk processed

WER: 62.348% ( 154 / 247) WRR: 37.652% ( 93 / 247)

SER (Sentence error rate was again 100% in both the cases).

Conclusion: Pico Voice does outperform Vosk in scores but other important factors to be considered:

Pico Voice only allows free usage of 360000 seconds per month...one needs access-key to be obtained online.Although the process seems to be offline but the key needs to be authenticated online.
Only english model is present.
Pico Voice also allows Speech-to-Text models with custom vocabularies you can add new words with custom pronunciations to fine tune the model (smart & practical way to increase efficiency).
Processing Vosk with spell check ...brings it's efficiency in par to Pico Voice.

Files:

Originals: Original trailer can be downloaded from the link described in procedure for your analysis. Original SRT obtained from youtube ...basic processing carried out using notepad++ (as in #892) *1) base.txt 2) pico.txt *3) vosk.txt

Please rename base.txt & vosk.txt to base.srt & vosk.srt as github does not allow srt file uploads.

Processed (spell correction)

Enjoy!

Apr 02 '22 16:04 ls-milkyway

Hi @ls-milkyway, which project was used for the spell correction?

Jun 08 '22 21:06 base21

Hi @ls-milkyway, which project was used for the spell correction?

Read https://github.com/alphacep/vosk-api/issues/892 ...it's mentioned there ...in fact there are many AI based spell correctors....try a new one to see if u get better results in post-processing.

Jun 22 '22 01:06 ls-milkyway

vosk-api vosk-api copied to clipboard

Vosk Versus Pico Voice - A quick comparison.

Vosk Vs Pico Voice (leopard)

vosk-api
vosk-api copied to clipboard