wit Behavior changes in Speech API for mp3 files (PT)

Do you want to request a feature, report a bug, or ask a question about wit?

bug

What is the current behavior?

Pass an mp3 file(8k) in the speech API (portuguese) results several times in responses with whitespaces(" ") and the transcription accuracy is smaller then before.

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.

Make a post request to /speech endpoint with mp3 file in app configured for portuguese

What is the expected behavior?

The same results as before the problem

If applicable, what is the App ID where you are experiencing this issue? If you do not provide this, we cannot help.

App Id: 2621810151445138

Aug 05 '21 17:08 DiegoHSeto

Hi @DiegoHSeto,

Thanks for reporting. Let me look into it.

In the meantime, upgrading to 16k sample rate flac/raw/wav as recently suggested in the API docs would both solve the issue and speed up the processing. It's something you might want to look into.

Aug 05 '21 19:08 patapizza

Hello @patapizza ,

Thanks for the quick response,

I'm still having problems using wav with 16k sample rate. I'm getting severel different results from the API as below for example:

Expected result (sample of 5 seconds of the audio):

"me chamo Luana, falo do grupo Katton sou especialista do Hambek tudo bem"

API results:

1 - "me chamo Luana, falo do grupo Kat e sou especialista do tudo"

2 - "me chamo Luana, falo do grupo Kat e sou especialista do Hambek , tudo bem"

3 - "Me chamo Luana, falo do grupo Kat e sou especialista do tudo"

4 - "Me chamo Luana, falo do grupo e sou especialista do tudo"

I know that the result depends of the quality of the audio, but i never experiencing this behavior before with these samples. Some words are being cut from the result while other words like "Katton " (that were being predicted correctly before) are being predicted wrongly. Sometimes API returns me an empty text too.

Aug 05 '21 20:08 DiegoHSeto

Hi @DiegoHSeto, can you confirm the encoding of the audio using the file command?

Aug 05 '21 21:08 patapizza

Before conversion:

RIFF (little-endian) data, WAVE audio, Microsoft ADPCM, stereo 8000 Hz

After conversion(being passed to API):

RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

Aug 05 '21 21:08 DiegoHSeto

Stereo isn't supported. The new format looks good. Are the files playing normally?

Aug 06 '21 05:08 patapizza

Stereo isn't supported. The new format looks good. Are the files playing normally?

Yes, the files are playing normally after the conversion, but the results are still bad. This worked fine previously, even with this conversion

Aug 06 '21 13:08 DiegoHSeto

Hello @patapizza,

Any updates on this?

Aug 06 '21 22:08 DiegoHSeto

Closing due to no movement on the issue. Please re-open or file a new task should the issue be persisting.

Apr 18 '23 10:04 Barbog

wit wit copied to clipboard

Behavior changes in Speech API for mp3 files (PT)

wit
wit copied to clipboard