openai-whatsapp-chatbot icon indicating copy to clipboard operation
openai-whatsapp-chatbot copied to clipboard

Add the option to use Whisper instead of Assembly AI to generate audio transcriptions

Open simonsanvil opened this issue 2 years ago • 3 comments

To keep with the theme of the repo, It would be better to use OpenAI's Whisper as the default option for voice-message transcriptions instead of using AssemblyAI (which was initially included because this was intended to be submitted to their 2022 Winter Hackathon). We can either serve the model from the app (perhaps in a separate container) or using HuggingFace's inference endpoints

simonsanvil avatar Dec 16 '22 22:12 simonsanvil

ITT this week OpenAI released their own API endpoint for audio transcription using Whisper, I attempted to include it in the last release of the app, but it doesn't seem to support the file format of Whatsapp audio files yet (ogg), so a preprocessing step of downloading it and converting it to mp3 would have to be added. As I thought this would slow even more the bot when answering audio messages, I decided to leave it out of this last release. I might explore it more in the near future.

simonsanvil avatar Mar 04 '23 19:03 simonsanvil

Deepgram is also an option. It costs much less, and has Speaker diarisation etc.. (And, they also have Whisper )

eladrave avatar Mar 08 '23 21:03 eladrave

Definitely something we could try

Deepgram is also an option. It costs much less, and has Speaker diarisation etc.. (And, they also have Whisper )

simonsanvil avatar Mar 09 '23 20:03 simonsanvil