aphrodite1028

Results 13 comments of aphrodite1028

and if we update k8s-device-plugin version ,for example, from 0.15.0 to 0.16.0.rc , some cuda processing instance already running in host machine ad docker. after nvidia-cuda-mps-control container rerunning, nvidia-cuda-mps-server not...

I have same issues using mps in docker cuda process, driver 535.129.03 and nvdp version is 0.15.0-rc1

> There is a known issue with 0.15.0-rc.1 where memory limits were not correctly applied. This will be addressed in v0.15.0-rc.2 which we will release soon. ok, i know, thanks...

> @aphrodite1028 @ysz-github we have just released https://github.com/NVIDIA/k8s-device-plugin/releases/tag/v0.15.0-rc.2 which should address this issue. Please let us know if you're still experiencing problems. I found https://github.com/NVIDIA/k8s-device-plugin/blob/main/cmd/mps-control-daemon/mps/daemon.go#L77-L85 here. if I do not...

> @aphrodite1028 . You shouldn't need to do anything special in your user container. The system starts the MPS server for all GPUs on the machine and your client will...

> There is an opensource plugin for CoPilot that could be converted. https://github.com/pannous/jini-plugin thanks for your reply, i will try it.

> @aphrodite1028 was it successful? yes, it works for me, but needs some development, thanks

> > thank you, chatglm2-6b works in AutoModelForCausalLM mode and the batch size shoud be 1. when batch > 1, I got error below : { "error": "Request failed during...

> thank you, chatglm2-6b works in AutoModelForCausalLM mode and the batch size shoud be 1. when batch > 1, I got error below : { "error": "Request failed during generation:...

> > --max-concurrent-requests 1 thanks, i will try it