dojoeisuke
dojoeisuke
> can you successfully launch vgpu task? No. The status of vcjob is pending
> > > can you successfully launch vgpu task? > > > > > > No. The status of vcjob is pending > > Thanks for your reply, can provide...
> "resource in cluster is overused" message means job is reject by enqueue action. Upon checking the volcano-scheduler log, it seems that the cause is the absence of "volcano.sh/vgpu-number" in...
Now `volcano-device-plugin` pod on GPU node outputs "could not load NVML library". ```bash root@k8s-tryvolcano-m001:~# k -n kube-system logs volcano-device-plugin-jtfxz I1027 05:40:47.592928 1 main.go:77] Loading NVML I1027 05:40:47.593106 1 main.go:79] Failed...
> @dojoeisuke can i see /etc/docker/daemon.json on that GPU node? ```bash root@k8s-tryvolcano-w004:~# cat /etc/docker/daemon.json { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } ```
> can this issue be reproduced without install Gpu Operator? I tried it. `volocano-device-plugin` pod on GPU node produced the following error output. ``` I1030 05:12:02.805254 1 main.go:77] Loading NVML...
There was an inadequacy in preparing the GPU node. In Kubernetes 1.24, it was necessary to install cri-dockerd and specify cri-dockerd as the cri-socket for "kubelet". - https://github.com/Mirantis/cri-dockerd As a...
Next I tried to launch [a example manifest](https://raw.githubusercontent.com/volcano-sh/devices/master/examples/vgpu-case02.yml), Note: the following fields was changed: - image: `nvidia/cuda:10.1-base-ubuntu18.04` -> `nvidia/cuda:12.1.0-base-ubuntu18.04` - vgpu-number: `1` -> `2` it failed due to the lack...
> > > can this issue be reproduced without install Gpu Operator? > > > > > > I tried it. > > `volocano-device-plugin` pod on GPU node produced the...
@archlitchi About https://github.com/volcano-sh/volcano/issues/3160#issuecomment-1784644826 , since "volocano.sh/vgpu-number" has become part of the allocatable resources, would it be better to close this issue? Also, should I submit a new issue about https://github.com/volcano-sh/volcano/issues/3160#issuecomment-1784664057...