Whisper-WebUI
                                
                                
                                
                                    Whisper-WebUI copied to clipboard
                            
                            
                            
                        GPU memory overflow in ROCm environment
I'm looking for an option to release GPU memory after Whisper tasks. Sometimes my PC shuts down due to overflowing GPU memory. Running on Debian(Bookworm) + Docker(rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2) + RX6800
I am aware that there is no official support for Linux or Docker environments. However, I believe that supporting these options would definitely have a positive impact in the future.
- Free GPU memory immediately after 
GENERATE SUBTITLEis finished - Free GPU memory if it is not used for a certain period after 
GENERATE SUBTITLEis finished 
I have no experience with Python, pytorch, etc., so my ability to interpret the project is limited. If could determine whether the feature implementation is possible, I will try to contribute in any way I can.
Hi! We've attempted to address this in #15.
Right now, we're calling torch.cuda.empty_cache() after each transcription.
If anyone has idea or PR for a better solution, it would be very appreciated!
@jhj0517 Thank you for your response. Unfortunately, in my environment, the memory is not observed to be fully released. I have a follow-up question to understand whether this is an issue occurring in Docker or ROCm environments.
Upon checking the document at https://pytorch.org/docs/stable/notes/cuda.html#memory-management, there is a section stating, empty_cache() "occupied GPU memory by tensors will not be freed." How large is the size of GPU memory occupied by tensors? Is this phenomenon also present in Nvidia environments?
Below are the details of my testing:
This is the baseline state. The Graphics pipe shown in the screenshot represents the energy in use / VRAM as GPU memory. Other applications are running, occupying 4607M of graphics memory.
This is during the execution of large-v3.1 11569M, 5466M was observed.
Some time after the execution of large-v3 ended, 11116M was observed.
This is during the execution with the medium model. It shows 8238M of memory in use.
This is just after the execution of the medium model has ended. 7838M was observed.
The whisper-webui process has been terminated. The memory has returned to its initial state.
Thanks for sharing your experience! According to here, "occupied GPU memory by tensors will not be freed" is normal behavior because it only frees GPU memory cache that can be freed.
Here's someone's experience running this web UI on an AMD GPU:
- #85
 
According to this, faster-whisper does not work with ROCm. So if you encounter any error while running this web UI, disabling faster-whisper could be helpful:
python app.py --disable_faster_whisper