chainlit icon indicating copy to clipboard operation
chainlit copied to clipboard

Allow audio input

Open jdb78 opened this issue 1 year ago • 10 comments

This feature would allow people to directly use their microphone to input audio. For example, Whisper could then transcribe this into a text message.

jdb78 avatar Jan 03 '24 09:01 jdb78

If i understand correctly it is about using your own transcribe model instead of the browser's?

willydouhard avatar Jan 03 '24 11:01 willydouhard

yes

jdb78 avatar Jan 03 '24 12:01 jdb78

This is something I am interested in as well. Would be happy to help in trying to add this functionality if I can be given some idea as to where to start in the code base (I found the current browser implementation in the react code, but am struggling to understand how the UI from React interacts with the python code).

datapay-ai avatar Jan 03 '24 14:01 datapay-ai

We acknowledge the community interest on that feature and we are open to contributions. The easiest way to do it is probably to make the audio recording a mp3 file through the browser apis and send this audio element with an empty message. Send elements with a message is already supported (like in multi modal examples).

willydouhard avatar Jan 03 '24 14:01 willydouhard

I am also looking for this feature. @jdb78 did you found any solution?

Girrajjangid avatar Jan 25 '24 16:01 Girrajjangid

.wav file receive wrong? is only support MP3 format?

llf10811020205 avatar Apr 24 '24 11:04 llf10811020205

I wanted to tackle this issue, had given it a go from my side and did made some progress.

So far the only issue I see is with browser audio APIs, I was only able to record generate .webm file (can generate .wav).

Any suggestions on how to tackle this? or should we do the conversion from backend side with ffmpeg?

b4s36t4 avatar Apr 25 '24 22:04 b4s36t4

Hi @b4s36t4. Thank you for giving it a go! I believe most speech-to-text APIs (including Whisper) can support wav files, thus being able to access a user's microphone recording in the backend as a wav file would be a major step forward.

I believe that depending on the use-case, it could be changed to MP4 in the backend via ffmpeg, but for the functionality stated by @jdb78 (which my team is also looking for), what you described would already be absolutely fantastic!

MaxMLang avatar Apr 28 '24 12:04 MaxMLang

i think should be better to use webrtc to do that because ffmpeg doesn't work correctly on mac and windows. and webrtc implementation is made for this . then it's open possibility to use something like this https://github.com/toverainc/willow-inference-server

didlawowo avatar Apr 30 '24 18:04 didlawowo

It would be great to have audio working in Chainlit. Below is the code that I use to do audio conversions in Gradio.

    # Transcribe audio to text
    def speech_to_text(self, audio):
        # Use OpenAI Whisper API. API reference: https://platform.openai.com/docs/api-reference/audio/createTranscription
        # Here audio is a path to the audio file
        try:
            self.log.info("Temporary audio file path: " + audio)
            audio_file = open(audio, "rb")
            transcript = openai.audio.transcriptions.create(
                model="whisper-1", file=audio_file, response_format="text"
            )
            self.log.info("Transcript: " + transcript)
            return transcript
        except Exception as e:
            self.log.error(e)
        return None

    # Generate audio from text using OpenAI API
    def text_to_speech(self, text):
        try:

            file_path = self.audio_out_dir + "/" + str(generate_uuid()) + ".m4a"

            with openai.audio.speech.with_streaming_response.create(
                model="tts-1", voice="shimmer", input=text, response_format="aac"
            ) as response:
                response.stream_to_file(file_path)
                return file_path
        except Exception as e:
            self.log.error(e)
        return None

Note: This is just a snippet from my audio utility class.

sumitsahoo avatar May 08 '24 14:05 sumitsahoo

Really appreciate the great work to have Microphone voice input capability with Chainlit. I have set it up by following the example audio-assistant, it works well on the laptop I launched the chainlit App, see below picture. But for other device within the same wifi network and using the same ip and port the voice/audio is not working (that Mic icon is not being response, but all other functions are all good). Any ideas/direction to resolve it?

image

chenrq2005 avatar Jun 10 '24 15:06 chenrq2005

I think I am facing similar issue with @chenrq2005. If I run my chainlit app on a specific address (not localhost), voice/audio function is gone.

@chenrq2005 have you found a solution to your problem?

Nau-git avatar Aug 16 '24 08:08 Nau-git

Reopening this issue @chenrq2005 @Nau-git , I am facing similar issue in #1295 . @willydouhard Please help with this

ambiSk avatar Sep 04 '24 12:09 ambiSk

I figured out why.

When you inspect the chainlit app, you'll see an error saying getuserMedia() is undefined. According to this (issue)[https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia], it's because the url is not secure - i.e. it's not a "https" URL. So you need to deploy on a https link.

tituslhy avatar Sep 05 '24 06:09 tituslhy

for mic usage you have to deploy over https if host is not localhost, you can create a self signed cert for private ip testing and install that cert on the testing machine to get the browser more relaxed on security blockings.

puppetm4st3r avatar Sep 09 '24 01:09 puppetm4st3r