wit
wit copied to clipboard
Behavior changes in Speech API for mp3 files (PT)
Do you want to request a feature, report a bug, or ask a question about wit?
bug
What is the current behavior?
Pass an mp3 file(8k) in the speech API (portuguese) results several times in responses with whitespaces(" ") and the transcription accuracy is smaller then before.
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.
Make a post request to /speech endpoint with mp3 file in app configured for portuguese
What is the expected behavior?
The same results as before the problem
If applicable, what is the App ID where you are experiencing this issue? If you do not provide this, we cannot help.
App Id: 2621810151445138
Hi @DiegoHSeto,
Thanks for reporting. Let me look into it.
In the meantime, upgrading to 16k sample rate flac/raw/wav as recently suggested in the API docs would both solve the issue and speed up the processing. It's something you might want to look into.
Hello @patapizza ,
Thanks for the quick response,
I'm still having problems using wav with 16k sample rate. I'm getting severel different results from the API as below for example:
Expected result (sample of 5 seconds of the audio):
"me chamo Luana, falo do grupo Katton sou especialista do Hambek tudo bem"
API results:
1 - "me chamo Luana, falo do grupo Kat e sou especialista do tudo"
2 - "me chamo Luana, falo do grupo Kat e sou especialista do Hambek , tudo bem"
3 - "Me chamo Luana, falo do grupo Kat e sou especialista do tudo"
4 - "Me chamo Luana, falo do grupo e sou especialista do tudo"
I know that the result depends of the quality of the audio, but i never experiencing this behavior before with these samples. Some words are being cut from the result while other words like "Katton " (that were being predicted correctly before) are being predicted wrongly. Sometimes API returns me an empty text too.
Hi @DiegoHSeto, can you confirm the encoding of the audio using the file command?
Before conversion:
RIFF (little-endian) data, WAVE audio, Microsoft ADPCM, stereo 8000 Hz
After conversion(being passed to API):
RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
Stereo isn't supported. The new format looks good. Are the files playing normally?
Stereo isn't supported. The new format looks good. Are the files playing normally?
Yes, the files are playing normally after the conversion, but the results are still bad. This worked fine previously, even with this conversion
Hello @patapizza,
Any updates on this?
Closing due to no movement on the issue. Please re-open or file a new task should the issue be persisting.