transformers.js
transformers.js copied to clipboard
[Severe] Memory leak issue under WebGPU Whisper transcribe pipeline
System Info
Using transformers.js v3 in latest Chrome release on Windows 10.
GPU: Nvidia GTX 1080 (8GB)
Environment/Platform
- [X] Website/web-app
- [ ] Browser extension
- [ ] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron)
- [ ] Other (e.g., VSCode extension)
Description
Transcribe using Whisper model with WebGPU does not dispose the tensor after finishing the pipeline. Checked that in nvidia-smi
while transcribing a .wav file into text. Memory consumption keeps growing until either it goes out-of memory (for smaller GPUs) or looses the device meanwhile the computation is going (producing console error saying 'device is lost').
Reproduction
- Use transcribe pipeline for transcribing a wav file of at least 1 minute
- Check GPU memory consumption while doing the computation (should increase meanwhile the computation is going on)
- Verify if the tensor is correctly being disposed after the computation is done (here actually it does not dispose the data in the GPU hence generating leaks)
- Can be easily verified using longer audio sequences (they enlarge the difference between resting GPU memory and meanwhile the computation)
Ideas (from Ratchet)
I spotted this great architecture of inference at https://github.com/huggingface/ratchet/blob/master/ARCHITECTURE.md in which the memory consumption for encoder-decoder model like Whisper is reduced by supporting both static & dynamic graphs as to have encoder completely static and decoder running under a dynamic graph due to KV caching.