wit
wit copied to clipboard
Speech - Support other Audio Formats
Currently wit supports PCM (WAV), mp3, ulaw, and raw.
I'm working with the iOS framework, which uses PCM. I've tweaked it to use ulaw instead, and on slow internet connections (bad 3G and edge) it nearly doubles the speed of recognition (from 10 seconds down to nearly 5), but the sound quality is worse.
iOS does not support recording in mp3 format, it offers (https://developer.apple.com/library/ios/qa/qa1615/_index.html):
kAudioFormatAppleLossless (alac), kAudioFormatAppleIMA4 (ima4), kAudioFormatiLBC (ilbc), kAudioFormatULaw (ulaw), kAudioFormatALaw (alaw)
According to http://chinaxxren.iteye.com/blog/1750476 the best compression vs sound quality would be provided by ima4, or at least alac? ilbc might be an option too, I'm not a codec expert, but my guess is anything is better than pcm and ulaw.
Could you add ima4, alac or ilbc support to /speech? I'd update the iOS sdk accordingly.
I believe the encoding ima-adpcm
that we provide is actually ima4, but haven't had the chance to try it.
Could you try and let us know?
Thanks, didn't notice that. I've tried, but I can't get it to work, my stream is not recognised by the server, for example:
2016-05-04 03:01:41.528 Wave[384:35387] Wit response 200 (0.761610 s) { "msg_id" : "bb141d03-b343-4f2d-b78d-9c9d74605d30", "_text" : null, "outcomes" : [ ] }
I'm not saying this is a server side issue, could be that I'm setting up something wrong on my end. You can find my code in my wit-ios-sdk fork: https://github.com/hactar/wit-ios-sdk
In particular, the recorder: https://github.com/hactar/wit-ios-sdk/blob/master/Wit/WITRecorder.m#L216-L225 and the uploader: https://github.com/hactar/wit-ios-sdk/blob/master/Wit/WITUploader.m#L80-L82
The code can be triggered by adding formatToUse = kAudioFormatAppleIMA4;
to https://github.com/hactar/wit-ios-sdk/blob/master/Wit/WITRecordingSession.m#L63
I use WIT for a facebook bot and the user audio is sent as a mp4 file. It would be nice if you could accept that format as well. Currently I'm trying to covert from .mp4
to .wa
v before calling the wit api, but it causes a huge delay to answer the user. (download the audio => convert audio => understand audio => answer user)
Any update if it supports mp4 since as @JoabMendes said, Facebook provides audio files as mp4 and the issue seems to have no update regarding this point whether it's been developed or on the roadmap. It's a long way to convert it to other formats which might cause problems and then reply to the user. And is there a way to accept mp4 with raw as a workaround for example? If so, what parameters should be provided?
And thanks in advance for your great work.
We currently don't have any current plans to support mp4, but you're right that it is something we should consider. Thanks.
We currently don't have any current plans to support mp4, but you're right that it is something we should consider. Thanks. Hi, So the mp4 format from FB messenger is audio and the codec used is AAC LC. So even using the content-type audio\raw and the parameters to define the encoding, the response is always empty. Both audio file and streaming not working
so any fast workarounds instead of converting the audio or other solutions to convert mp4 to text..?
Closing due to no movement on the issue. Please re-open or file a new task should the issue be persisting.
I agree. It is still not planned. No problem.
On Tue, Apr 18, 2023, 3:54 AM Barbog @.***> wrote:
Closed #217 https://github.com/wit-ai/wit/issues/217 as not planned.
— Reply to this email directly, view it on GitHub https://github.com/wit-ai/wit/issues/217#event-9033770478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWTZJTL3CZOTJCCCGRW2ITXBZQEHANCNFSM4CCIMYFQ . You are receiving this because you commented.Message ID: @.***>