Alexandre Strube comments

Results 193 comments of


                                            Alexandre Strube

Error on Singularity Pull for NVIDIA TensorFlow Container

Ah, ok!

--auto-devices goes to a single gpu

So, this is a per-gpu thing? I tried on an 8-gpu node and I got this: ``` python server.py --auto-devices --gpu-memory 20 20 20 20 20 20 20 20 20...

one server, multiple sessions (users) ((feature request))

[Feature Request]: Support Deepfloyd IF?

CUDA out of memory in CLI vicuna 7B

@mpetruc this looks like some other process took over GPU memory. Did you check with nvidia-smi if there was something there? Is it still an issue for you?

RuntimeError: The size of tensor a (32000) must match the size of tensor b (32001) at non-singleton dimension 0

@LetsGoFir did you solve it? Seems like many suggestions here. I will close this one, as it seems that the suggestions are helpful enough, and it's not a bug of...

Support for cosmosage

@infwinston Would you care to have a look? I fear that this would start to diverge more and more from main as time passes.

启动模型的时候指定gpu报错

What do you have as the variable CUDA_VISIBLE_DEVICES? And what does show up in `nvidia-smi`? The code itself is saying that this is a PyTorch bug, but in any case,...

Slower inference with vLLM worker on 4 A100

While that would be true for games, for LLMs is not true that more GPUs == more performance. Turns out that there's a lot of data movement going on among...

Slower inference with vLLM worker on 4 A100

I am not sure this applies here. The OP is talking about local inference on a single compute node with 4gpus. Are we talking about the same thing?