transformers.js [Severe] Memory leak issue under WebGPU Whisper transcribe pipeline

[Severe] Memory leak issue under WebGPU Whisper transcribe pipeline

Open MatteoFasulo opened this issue 6 months ago • 2 comments

System Info

Using transformers.js v3 in latest Chrome release on Windows 10.

GPU: Nvidia GTX 1080 (8GB)

Environment/Platform

[X] Website/web-app
[ ] Browser extension
[ ] Server-side (e.g., Node.js, Deno, Bun)
[ ] Desktop app (e.g., Electron)
[ ] Other (e.g., VSCode extension)

Description

Transcribe using Whisper model with WebGPU does not dispose the tensor after finishing the pipeline. Checked that in nvidia-smi while transcribing a .wav file into text. Memory consumption keeps growing until either it goes out-of memory (for smaller GPUs) or looses the device meanwhile the computation is going (producing console error saying 'device is lost').

Reproduction

Use transcribe pipeline for transcribing a wav file of at least 1 minute
Check GPU memory consumption while doing the computation (should increase meanwhile the computation is going on)
Verify if the tensor is correctly being disposed after the computation is done (here actually it does not dispose the data in the GPU hence generating leaks)
Can be easily verified using longer audio sequences (they enlarge the difference between resting GPU memory and meanwhile the computation)

Ideas (from Ratchet)

I spotted this great architecture of inference at https://github.com/huggingface/ratchet/blob/master/ARCHITECTURE.md in which the memory consumption for encoder-decoder model like Whisper is reduced by supporting both static & dynamic graphs as to have encoder completely static and decoder running under a dynamic graph due to KV caching.

Jul 23 '24 15:07 MatteoFasulo

transformers.js transformers.js copied to clipboard

[Severe] Memory leak issue under WebGPU Whisper transcribe pipeline

System Info

Environment/Platform

Description

Reproduction

Ideas (from Ratchet)

transformers.js
transformers.js copied to clipboard