aphrodite1028
aphrodite1028
and if we update k8s-device-plugin version ,for example, from 0.15.0 to 0.16.0.rc , some cuda processing instance already running in host machine ad docker. after nvidia-cuda-mps-control container rerunning, nvidia-cuda-mps-server not...
I have same issues using mps in docker cuda process, driver 535.129.03 and nvdp version is 0.15.0-rc1
> There is a known issue with 0.15.0-rc.1 where memory limits were not correctly applied. This will be addressed in v0.15.0-rc.2 which we will release soon. ok, i know, thanks...
> @aphrodite1028 @ysz-github we have just released https://github.com/NVIDIA/k8s-device-plugin/releases/tag/v0.15.0-rc.2 which should address this issue. Please let us know if you're still experiencing problems. I found https://github.com/NVIDIA/k8s-device-plugin/blob/main/cmd/mps-control-daemon/mps/daemon.go#L77-L85 here. if I do not...
> @aphrodite1028 . You shouldn't need to do anything special in your user container. The system starts the MPS server for all GPUs on the machine and your client will...
> There is an opensource plugin for CoPilot that could be converted. https://github.com/pannous/jini-plugin thanks for your reply, i will try it.
> @aphrodite1028 was it successful? yes, it works for me, but needs some development, thanks
> > thank you, chatglm2-6b works in AutoModelForCausalLM mode and the batch size shoud be 1. when batch > 1, I got error below : { "error": "Request failed during...
> thank you, chatglm2-6b works in AutoModelForCausalLM mode and the batch size shoud be 1. when batch > 1, I got error below : { "error": "Request failed during generation:...
> > --max-concurrent-requests 1 thanks, i will try it