FireRedASR
FireRedASR copied to clipboard
[ROCm] Add Torch SDPA and xFormers optimization for FireRedASR
Hi FireRedTeam, thanks for your great work!
This PR aims to add FireRedASR optimization on ROCm on target platform AMD Instinct MI300+ GPU.
- Add
docker/Dockerfile.rocmto quickly setup ROCm7 environment for deployment - Add Pytorch SDPA and xFormer Attention support for performance optimization, can be controlled by environment variable:
ATTENTION_BACKEND="SDPA"andATTENTION_BACKEND="XFORMERS" - Fix torch.load issue with
weight_only=Falsefor torch >= 2.6 - Add benchmark scripts and torch profiling support for performance analysis of different attention backend.
- Add FireRedASR optimization on ROCm guide in README.md.
Here are performance results with example audio (batch size=1) on single MI308X for your reference:
| ATTENTION_BACKEND | RTF | Performance gain vs Native |
|---|---|---|
| Native | 0.063 | / |
| Torch SDPA | 0.048 | 23.81% |
| xFormers Attention | 0.056 | 11.11% |
Thanks for your PR, we will review.