Nguyen Nhi Thanh Tai comments

Repositories
Issues
Comments

Results 1 comments of


                                            Nguyen Nhi Thanh Tai

ray OOM in tensor parallel

If anybody run vllm on Triton server Triton server will auto run your llm instance on every possible GPU. So if you have 2 GPU and you run --tensor-parallel-size 2....