Make-An-Audio icon indicating copy to clipboard operation
Make-An-Audio copied to clipboard

feat(ui): introduce single‑file Gradio demo (app.py)

Open eoffermann opened this issue 6 months ago • 0 comments

Summary

Adds a self-contained Gradio front-end that turns text prompts into audio clips using the existing diffusion sampler and BigVGAN vocoder. Launch it with one command, explore results in the browser, download clips on demand—no disk writes unless the user clicks Download.


Highlights

  • Zero configuration: python app.py opens http://127.0.0.1:7860.
  • Five inputs: prompt, DDIM steps, duration, guidance scale, sample count (up to 10).
  • Parallel previews: up to 10 audio players appear dynamically; each has a built-in download button.
  • Stateless: all artefacts remain in RAM; nothing persists after the session.
  • Efficient cold-start: models load once at import; subsequent generations reuse them.

How to run

python app.py

That’s it—the default browser will open automatically.


Implementation notes

  • Tested locally on CUDA 12.4 GPU and on a CPU-only machine.
  • generate_and_update always returns a list of exactly MAX_AUDIO_PLAYERS gr.update objects, keeping Gradio’s diffing predictable.
  • TODOs are embedded in the docstring (GPU OOM handling, input validation, seed control).

Checklist

  • [x] Code follows project style and PEP 8.
  • [x] Comprehensive docstrings and inline comments.
  • [x] No new runtime dependencies (except gradio)
  • [x] Manual tests: GPU (CUDA 12.4) and CPU paths.

eoffermann avatar Apr 18 '25 21:04 eoffermann