chainlit
chainlit copied to clipboard
Allow audio input
This feature would allow people to directly use their microphone to input audio. For example, Whisper could then transcribe this into a text message.
If i understand correctly it is about using your own transcribe model instead of the browser's?
yes
This is something I am interested in as well. Would be happy to help in trying to add this functionality if I can be given some idea as to where to start in the code base (I found the current browser implementation in the react code, but am struggling to understand how the UI from React interacts with the python code).
We acknowledge the community interest on that feature and we are open to contributions. The easiest way to do it is probably to make the audio recording a mp3 file through the browser apis and send this audio element with an empty message. Send elements with a message is already supported (like in multi modal examples).
I am also looking for this feature. @jdb78 did you found any solution?
.wav file receive wrong? is only support MP3 format?
I wanted to tackle this issue, had given it a go from my side and did made some progress.
So far the only issue I see is with browser audio APIs, I was only able to record generate .webm file (can generate .wav).
Any suggestions on how to tackle this? or should we do the conversion from backend side with ffmpeg?
Hi @b4s36t4. Thank you for giving it a go! I believe most speech-to-text APIs (including Whisper) can support wav files, thus being able to access a user's microphone recording in the backend as a wav file would be a major step forward.
I believe that depending on the use-case, it could be changed to MP4 in the backend via ffmpeg, but for the functionality stated by @jdb78 (which my team is also looking for), what you described would already be absolutely fantastic!
i think should be better to use webrtc to do that because ffmpeg doesn't work correctly on mac and windows. and webrtc implementation is made for this . then it's open possibility to use something like this https://github.com/toverainc/willow-inference-server
It would be great to have audio working in Chainlit. Below is the code that I use to do audio conversions in Gradio.
# Transcribe audio to text
def speech_to_text(self, audio):
# Use OpenAI Whisper API. API reference: https://platform.openai.com/docs/api-reference/audio/createTranscription
# Here audio is a path to the audio file
try:
self.log.info("Temporary audio file path: " + audio)
audio_file = open(audio, "rb")
transcript = openai.audio.transcriptions.create(
model="whisper-1", file=audio_file, response_format="text"
)
self.log.info("Transcript: " + transcript)
return transcript
except Exception as e:
self.log.error(e)
return None
# Generate audio from text using OpenAI API
def text_to_speech(self, text):
try:
file_path = self.audio_out_dir + "/" + str(generate_uuid()) + ".m4a"
with openai.audio.speech.with_streaming_response.create(
model="tts-1", voice="shimmer", input=text, response_format="aac"
) as response:
response.stream_to_file(file_path)
return file_path
except Exception as e:
self.log.error(e)
return None
Note: This is just a snippet from my audio utility class.
Really appreciate the great work to have Microphone voice input capability with Chainlit. I have set it up by following the example audio-assistant, it works well on the laptop I launched the chainlit App, see below picture. But for other device within the same wifi network and using the same ip and port the voice/audio is not working (that Mic icon is not being response, but all other functions are all good). Any ideas/direction to resolve it?
I think I am facing similar issue with @chenrq2005. If I run my chainlit app on a specific address (not localhost), voice/audio function is gone.
@chenrq2005 have you found a solution to your problem?
Reopening this issue @chenrq2005 @Nau-git , I am facing similar issue in #1295 . @willydouhard Please help with this
I figured out why.
When you inspect the chainlit app, you'll see an error saying getuserMedia() is undefined. According to this (issue)[https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia], it's because the url is not secure - i.e. it's not a "https" URL. So you need to deploy on a https link.
for mic usage you have to deploy over https if host is not localhost, you can create a self signed cert for private ip testing and install that cert on the testing machine to get the browser more relaxed on security blockings.