Question about CUDA failed with error out of memory error
Hi, I have recently changed the server and with this I have the possibility of an intel i5-10th iGPU and a nvidia GTX 1650 I am installing openai-whisper with the faster_whisper module, this is the docker-compose
services:
whisperasr:
image: onerahmet/openai-whisper-asr-webservice:latest-gpu
container_name: Openai-Whisper
environment:
- ASR_MODEL=large
- ASR_ENGINE=faster_whisper
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
ports:
- 9000:9000
restart: unless-stopped
I was hoping to be able to use the large model but I think it is too big, in fact I get the error CUDA failed with error out of memory, it seems like it wants to dump it all into the video card ram, maybe what I am asking is impossible, but I don't know how the whole system works and I don't see mounted volumes, so I ask hoping the question is not too stupid, can't you download the model locally and use it without the need for it all to be loaded into memory? Or is there a way to share the system ram with the video card ram when needed?
J
Use smaller model or use device="cpu".
can't you download the model locally and use it without the need for it all to be loaded into memory?
You can't.
Thank you @Purfview for your reply. For cpu ok, but what is the difference if I run this docker-compose?
services:
whisperasr:
image: onerahmet/openai-whisper-asr-webservice:latest
environment:
- ASR_MODEL=small
- ASR_ENGINE=faster_whisper
ports:
- 9000:9000
restart: unless-stopped
Is the same or is better to remain with this and specify the cpu mode?