Make-An-Audio
Make-An-Audio copied to clipboard
feat(ui): introduce single‑file Gradio demo (app.py)
Summary
Adds a self-contained Gradio front-end that turns text prompts into audio clips using the existing diffusion sampler and BigVGAN vocoder. Launch it with one command, explore results in the browser, download clips on demand—no disk writes unless the user clicks Download.
Highlights
- Zero configuration:
python app.pyopenshttp://127.0.0.1:7860. - Five inputs: prompt, DDIM steps, duration, guidance scale, sample count (up to 10).
- Parallel previews: up to 10 audio players appear dynamically; each has a built-in download button.
- Stateless: all artefacts remain in RAM; nothing persists after the session.
- Efficient cold-start: models load once at import; subsequent generations reuse them.
How to run
python app.py
That’s it—the default browser will open automatically.
Implementation notes
- Tested locally on CUDA 12.4 GPU and on a CPU-only machine.
generate_and_updatealways returns a list of exactlyMAX_AUDIO_PLAYERSgr.updateobjects, keeping Gradio’s diffing predictable.- TODOs are embedded in the docstring (GPU OOM handling, input validation, seed control).
Checklist
- [x] Code follows project style and PEP 8.
- [x] Comprehensive docstrings and inline comments.
- [x] No new runtime dependencies (except gradio)
- [x] Manual tests: GPU (CUDA 12.4) and CPU paths.