thangnguyenduc1-vti
thangnguyenduc1-vti
@fanyang89 @Salomonmejia1 thank you guys very much for the reference. Today, i deployed successfully on dual RTX 5080. For someone who looking for the solution that running on dual RTX5080,...
still hope that vllm team have plan to handle this natively on vllm serve instead of running on docker
> Hi [@thangnguyenduc1-vti](https://github.com/thangnguyenduc1-vti), how do you manage to integrate an 8B model into 16GB of VRAM? I understand that even if you're using a dual-system, you don't have 32GB of...