vosk-api
vosk-api copied to clipboard
Vosk Versus Pico Voice - A quick comparison.
Vosk Vs Pico Voice (leopard)
Models Used: Vosk Model: En-US 0.22 Pico Voice Model: En-Inbuilt (needs access-key which can be obtained by logging in to Pico Voice Console).
Procedure: I wished to compare Vosk with another highly boasted ASR project called Picovoice (leopard) but this time I wanted to use a simple file with less audio complexity as compared to my earlier comparison of #892. Plus video is of short duration... i.e. trailer 5 of Batman 2022 containing better stereo audio in PCM format (2300kb/s at 48 KHZ).
Procedure was same as in #892 except spleeter was not used ....& audio file was simple (with no US slangs, bad words etc.) .......but it definitely consists of low & different voice pitches.
Results: Pico unprocessed
WER: 62.162% ( 161 / 259) WRR: 39.382% ( 102 / 259)
Pico processed
WER: 61.776% ( 160 / 259) WRR: 40.154% ( 104 / 259)
Vosk unprocessed
WER: 111.446% ( 185 / 166) WRR: 4.819% ( 8 / 166)
Vosk processed
WER: 62.348% ( 154 / 247) WRR: 37.652% ( 93 / 247)
SER (Sentence error rate was again 100% in both the cases).
Conclusion: Pico Voice does outperform Vosk in scores but other important factors to be considered:
- Pico Voice only allows free usage of 360000 seconds per month...one needs access-key to be obtained online.Although the process seems to be offline but the key needs to be authenticated online.
- Only english model is present.
- Pico Voice also allows Speech-to-Text models with custom vocabularies you can add new words with custom pronunciations to fine tune the model (smart & practical way to increase efficiency).
- Processing Vosk with spell check ...brings it's efficiency in par to Pico Voice.
Files:
Originals: Original trailer can be downloaded from the link described in procedure for your analysis. Original SRT obtained from youtube ...basic processing carried out using notepad++ (as in #892) *1) base.txt 2) pico.txt *3) vosk.txt
- Please rename base.txt & vosk.txt to base.srt & vosk.srt as github does not allow srt file uploads.
Processed (spell correction)
Enjoy!
Hi @ls-milkyway, which project was used for the spell correction?
Hi @ls-milkyway, which project was used for the spell correction?
Read https://github.com/alphacep/vosk-api/issues/892 ...it's mentioned there ...in fact there are many AI based spell correctors....try a new one to see if u get better results in post-processing.