panel icon indicating copy to clipboard operation
panel copied to clipboard

Add Audio Input Widget

Open MarcSkovMadsen opened this issue 1 year ago • 4 comments

Closing https://github.com/holoviz/panel/issues/4048.

For now this is exploration in order to be able to design the widget.

Related issues

  • https://github.com/holoviz/panel/issues/7035
  • https://github.com/holoviz/panel/issues/7090
  • https://github.com/holoviz/panel/issues/7021 (We should show how to integrate audio input with chat)
  • https://github.com/holoviz/panel/issues/4861

Design

Inspiration

  • Gradio Real time speech recognition: https://www.gradio.app/guides/real-time-speech-recognition
    • https://github.com/gradio-app/gradio/blob/main/gradio/components/audio.py
    • https://github.com/gradio-app/gradio/blob/main/js/audio/recorder/AudioRecorder.svelte
  • Conversion in the browser: https://stackoverflow.com/questions/57365486/converting-blob-webm-to-audio-file-wav-or-mp3
  • Wave Surfer: https://wavesurfer.xyz/docs/ and recording example https://wavesurfer.xyz/examples/?record.js
  • Streamlit Experimental Audio Recorder https://docs.streamlit.io/develop/api-reference/widgets/st.audio_input and https://github.com/streamlit/streamlit/tree/develop/frontend/lib/src/components/widgets/AudioInput.
  • Streamlit Audio Recorder, https://github.com/stefanrmmr/streamlit-audio-recorder, Audio React Recorder: https://doppelgunner.github.io/audio-react-recorder/, https://github.com/doppelgunner/audio-react-recorder?tab=readme-ov-file
  • Audio Recorder Streamlit: https://github.com/Joooohan/audio-recorder-streamlit
  • Streamlit-audiorecorder: https://github.com/theevann/streamlit-audiorecorder and https://github.com/samhirtarif/react-audio-recorder
  • https://github.com/whitphx/streamlit-webrtc
  • OpenAI Realtime API https://platform.openai.com/docs/guides/realtime/overview

Questions

Design Decisions to be taken

  • Do we want to focus on Audio input or combine Audio and Video? The Media Stream Api Supports both?
    • [x] Audio.
    • [ ] Video
  • What should the name be
    • [ ] Microphone
    • [x] AudioInput
    • [ ] AudioRecorder
  • Do we want to enable incremental streaming? Or just sending value when recording is finished?
    • [x] Final Value
    • [ ] Streaming value
  • Do we want to support more value formats than default webm? mp3, ogg, wav etc. Converting on client side might require cross origin isolation. Converting on server side might require ffmpeg installation.
    • [x] mp3, ogg, wav
    • [x] Conversion on client side
  • Do we want bare minimum UI (Start, Stop, Pause)? Or extra features:
    • [x] submit button
    • [x] playback button?
    • [x] wave graph?
    • [ ] editing possibilities?
  • Do we want compact UI like Streamlit or large UI like Gradio?
    • [x] Compact UI
    • [ ] Large UI
  • Do we want to build on raw Media Stream Recording API or library?
    • [x] Wavesurfer https://wavesurfer.xyz/docs/ (What Gradio and Streamlit are using)
    • [ ] React Audio Recorder https://github.com/samhirtarif/react-audio-recorder (looks really simple to implement. But uses React).
  • Do we want to make it easy for users to
    • [x] Play the value in the Audio pane?
    • [x] work with the value as a Numpy Array?
    • [x] work with the value as a data url?
    • [x] Save the value to a file?
  • How do we most efficiently transfer the media from client to server
    • [x] Bytes (I don't yet know how to do this)
    • [ ] data url (This is easily doable)

Items marked with [x] are the choices we should select to implement.

MarcSkovMadsen avatar Oct 06 '24 07:10 MarcSkovMadsen

Codecov Report

Attention: Patch coverage is 0% with 39 lines in your changes missing coverage. Please review.

Project coverage is 82.14%. Comparing base (5ef8909) to head (a0d9dce). Report is 258 commits behind head on main.

Files with missing lines Patch % Lines
panel/widgets/microphone.py 0.00% 39 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7363      +/-   ##
==========================================
- Coverage   82.21%   82.14%   -0.08%     
==========================================
  Files         337      338       +1     
  Lines       50513    50552      +39     
==========================================
- Hits        41530    41524       -6     
- Misses       8983     9028      +45     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Oct 06 '24 08:10 codecov[bot]

If you are interested in audio input feel free to comment on questions above @philippjfr and @ahuang11 .

MarcSkovMadsen avatar Oct 06 '24 08:10 MarcSkovMadsen

Not questions (yet), but is there a corresponding JS file?

philippjfr avatar Oct 15 '24 12:10 philippjfr

No. Currently just exploring the design.

MarcSkovMadsen avatar Oct 15 '24 17:10 MarcSkovMadsen