Alexandre Strube
Alexandre Strube
So, this is a per-gpu thing? I tried on an 8-gpu node and I got this: ``` python server.py --auto-devices --gpu-memory 20 20 20 20 20 20 20 20 20...
@mpetruc this looks like some other process took over GPU memory. Did you check with nvidia-smi if there was something there? Is it still an issue for you?
@LetsGoFir did you solve it? Seems like many suggestions here. I will close this one, as it seems that the suggestions are helpful enough, and it's not a bug of...
@infwinston Would you care to have a look? I fear that this would start to diverge more and more from main as time passes.
What do you have as the variable CUDA_VISIBLE_DEVICES? And what does show up in `nvidia-smi`? The code itself is saying that this is a PyTorch bug, but in any case,...
While that would be true for games, for LLMs is not true that more GPUs == more performance. Turns out that there's a lot of data movement going on among...
I am not sure this applies here. The OP is talking about local inference on a single compute node with 4gpus. Are we talking about the same thing?