vosk-api
vosk-api copied to clipboard
what is the exact audio format requirements ?
Could you explain to me what is the exact audio file requirements ? I don't understand why it works when I convert MP3 to WAV but it doesn't when I convert from PCM or OGG to WAV ( all audio is the same source, it comes from Amazon Polly )
It would be great to use straight WAV without the need to convert it, to preserve quality
Thanks
I don't understand why it works when I convert MP3 to WAV but it doesn't when I convert from PCM or OGG to WAV ( all audio is the same source, it comes from Amazon Polly )
We don't understand either, you'd better share the files if you need help
It would be great to use straight WAV without the need to convert it, to preserve quality
Nothing stops you here
This is the file on MP3 format. If I convert this to WAV, I can process the speech correctly in Vosk http://sndup.net/88qg
MP3 format metadata:
encoding | mp3
-- | --
format | fltp
number_of_channel | 1 (mono)
sample_rate | 24000
file_size | 20061 byte
duration | 3.336s
MP3 converted to WAV:
http://sndup.net/fwpp
WAV metadata :
encoding | pcm_s16le
-- | --
format | s16
number_of_channel | 1 (mono)
sample_rate | 24000
file_size | 160172 byte
duration | 3.336s
This is another file obtained with the same procedure, but in OGG format. If I convert this file to WAV, my Vosk code doesn't get any result http://sndup.net/v5zh
OGG format metadata:
encoding | vorbis
-- | --
format | fltp
number_of_channel | 1 (mono)
sample_rate | 24000
file_size | 21215 byte
duration | 3.28s
OGG converted to WAV : http://sndup.net/nsvf
WAV format metadata:
encoding | pcm_s32le
-- | --
format | s32
number_of_channel | 1 (mono)
sample_rate | 24000
file_size | 314924 byte
duration | 3.28s
I have also tested raw PCM but I can't share PCM through this service
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
print ("Audio file must be WAV format mono PCM.")
exit (1)
I have added this audio test procedure from your tests, but it passes all the time. So I don't really know what's wrong with the PCM and OGG files. Maybe it's the conversion procedure ?
Thank you very much
Hi. The second file has sample size 32. We need 16:
format | s32
to force conversion to 16 bit use the following command:
ffmpeg -i <input_file> -ar 16000 -ac 1 -acodec pcm_s16le file.wav
and it will work fine
thanks :) I kind of thought about this when I posted it here, but I went to do something else instead and wanted to fix it later. Thank you for your confirmation
Great. We might need a method to check the file in the API one day.
that'd be great, but I think it would be enough for now if you add it to the documentation perhaps ? That way you can avoid people complaining about it. Anyways, thanks for the effort !
format should be wav mono, not wavex. If a built-in decoder is used. ffmpeg accepts any format. wavex is an extensible format, has a different file header, but a typical wav extension.