gradio
gradio copied to clipboard
Allow for a selection of both webcam/mic input and uploading at the same time for `audio`, `video` and `image`
- [x] I have searched to see if a similar issue already exists.
Is your feature request related to a problem? Please describe.
It is not rare to want to let the user choose whether they want to upload an audio or record from the microphone, or upload a picture versus take a photo with the webcam, same with videos.
Describe the solution you'd like
- Add a
source
==both
togr.Video()
that allows for the user to choose whether to use the video from the webcam or upload a video - Add a
streaming
==both
togr.Audio()
andgr.Image()
that allows for the user to choose whether to record audio/take a picture or upload the media
Additional context
This issue happened when I visited this Space to transcribe an audio I had and I couldn't do it.
Similar to #1593. But I'm not a fan of combining multiple sources in the backend since the choice of source
can affect other parameters as well. For example, the source
of an Image
affects the default value of tool
. Similarly, the source
of a Video
affects the default value of include_audio
, which affects how the input file is preprocessed. Handling all of these cases would add significant complexity that we'd need to manage, and this isn't even considering the implementation on the frontend side.
As an alternative, the desired functionality can be achieved by users themselves (with admittedly more code) by using Tabs or by changing the visibility of components. I think this is more transparent and easier to manage WDYT? cc @aliabid94 @freddyaboulton
There are actually several examples of community-built demos for Whisper that leverage Tabs to achieve this already, e.g. https://huggingface.co/spaces/fffiloni/whisper-to-stable-diffusion
From a UI/UX perspective imo the use-case of "either upload or record" is quite common and as you mentioned building this either/or system via tabs is a bit complex to do. IMO streamlining it would benefit many demos, but the trade-off with how messy that is with the backend lies with y'all ofc!
In case anyone needs to achieve this kind of functionality in Blocks, here's a code snippet: https://huggingface.co/spaces/abidlabs/mic_or_file/blob/main/app.py
Based on discussions with @pngwn, it makes the most sense for this to be a new component rather than an option within the existing gradio library (internal conversation here: https://huggingface.slack.com/archives/C02SPHC1KD1/p1675340839336399)
This is now done (and the default), as part of Gradio 4.0!