gradio icon indicating copy to clipboard operation
gradio copied to clipboard

Allow for a selection of both webcam/mic input and uploading at the same time for `audio`, `video` and `image`

Open apolinario opened this issue 1 year ago • 4 comments

  • [x] I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
It is not rare to want to let the user choose whether they want to upload an audio or record from the microphone, or upload a picture versus take a photo with the webcam, same with videos.

Describe the solution you'd like

  • Add a source == both to gr.Video() that allows for the user to choose whether to use the video from the webcam or upload a video
  • Add a streaming == both to gr.Audio() and gr.Image() that allows for the user to choose whether to record audio/take a picture or upload the media

Additional context
This issue happened when I visited this Space to transcribe an audio I had and I couldn't do it.

apolinario avatar Jan 31 '23 16:01 apolinario

Similar to #1593. But I'm not a fan of combining multiple sources in the backend since the choice of source can affect other parameters as well. For example, the source of an Image affects the default value of tool. Similarly, the source of a Video affects the default value of include_audio, which affects how the input file is preprocessed. Handling all of these cases would add significant complexity that we'd need to manage, and this isn't even considering the implementation on the frontend side.

As an alternative, the desired functionality can be achieved by users themselves (with admittedly more code) by using Tabs or by changing the visibility of components. I think this is more transparent and easier to manage WDYT? cc @aliabid94 @freddyaboulton

There are actually several examples of community-built demos for Whisper that leverage Tabs to achieve this already, e.g. https://huggingface.co/spaces/fffiloni/whisper-to-stable-diffusion

abidlabs avatar Jan 31 '23 22:01 abidlabs

From a UI/UX perspective imo the use-case of "either upload or record" is quite common and as you mentioned building this either/or system via tabs is a bit complex to do. IMO streamlining it would benefit many demos, but the trade-off with how messy that is with the backend lies with y'all ofc!

apolinario avatar Feb 01 '23 17:02 apolinario

In case anyone needs to achieve this kind of functionality in Blocks, here's a code snippet: https://huggingface.co/spaces/abidlabs/mic_or_file/blob/main/app.py

abidlabs avatar Feb 02 '23 13:02 abidlabs

Based on discussions with @pngwn, it makes the most sense for this to be a new component rather than an option within the existing gradio library (internal conversation here: https://huggingface.slack.com/archives/C02SPHC1KD1/p1675340839336399)

abidlabs avatar Feb 06 '23 21:02 abidlabs

This is now done (and the default), as part of Gradio 4.0!

abidlabs avatar Nov 07 '23 00:11 abidlabs