Whisper-WebUI icon indicating copy to clipboard operation
Whisper-WebUI copied to clipboard

GPU memory overflow in ROCm environment

Open labeldock opened this issue 1 year ago • 3 comments

I'm looking for an option to release GPU memory after Whisper tasks. Sometimes my PC shuts down due to overflowing GPU memory. Running on Debian(Bookworm) + Docker(rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2) + RX6800

I am aware that there is no official support for Linux or Docker environments. However, I believe that supporting these options would definitely have a positive impact in the future.

  • Free GPU memory immediately after GENERATE SUBTITLE is finished
  • Free GPU memory if it is not used for a certain period after GENERATE SUBTITLE is finished

I have no experience with Python, pytorch, etc., so my ability to interpret the project is limited. If could determine whether the feature implementation is possible, I will try to contribute in any way I can.

labeldock avatar Apr 04 '24 05:04 labeldock

Hi! We've attempted to address this in #15. Right now, we're calling torch.cuda.empty_cache() after each transcription.

If anyone has idea or PR for a better solution, it would be very appreciated!

jhj0517 avatar Apr 07 '24 18:04 jhj0517

@jhj0517 Thank you for your response. Unfortunately, in my environment, the memory is not observed to be fully released. I have a follow-up question to understand whether this is an issue occurring in Docker or ROCm environments.

Upon checking the document at https://pytorch.org/docs/stable/notes/cuda.html#memory-management, there is a section stating, empty_cache() "occupied GPU memory by tensors will not be freed." How large is the size of GPU memory occupied by tensors? Is this phenomenon also present in Nvidia environments?

Below are the details of my testing:

스크린샷 2024-04-09 151629

This is the baseline state. The Graphics pipe shown in the screenshot represents the energy in use / VRAM as GPU memory. Other applications are running, occupying 4607M of graphics memory.

스크린샷 2024-04-09 151744

스크린샷 2024-04-09 152013

This is during the execution of large-v3.1 11569M, 5466M was observed.

스크린샷 2024-04-09 151947

Some time after the execution of large-v3 ended, 11116M was observed.

스크린샷 2024-04-09 152031

This is during the execution with the medium model. It shows 8238M of memory in use.

스크린샷 2024-04-09 152122

This is just after the execution of the medium model has ended. 7838M was observed.

스크린샷 2024-04-09 153126

The whisper-webui process has been terminated. The memory has returned to its initial state.

labeldock avatar Apr 09 '24 06:04 labeldock

Thanks for sharing your experience! According to here, "occupied GPU memory by tensors will not be freed" is normal behavior because it only frees GPU memory cache that can be freed.

Here's someone's experience running this web UI on an AMD GPU:

  • #85

According to this, faster-whisper does not work with ROCm. So if you encounter any error while running this web UI, disabling faster-whisper could be helpful:

python app.py --disable_faster_whisper

jhj0517 avatar Apr 09 '24 10:04 jhj0517