rongfu.leng
rongfu.leng
1. having two nodes, a node having GPU, another node not GPU. 2. apply a use nvidia.com.gpu resources pod yaml, some time pod can schedule to no GPU node.
https://github.com/NVIDIA/k8s-device-plugin?tab=readme-ov-file#with-cuda-mps
issue: https://github.com/containerd/containerd/issues/5046 Usage: UNIX: ``` ctr run --device= ctr run --device=: ctr run --device=:: ``` Window: (Support cli to set HOST PATH, don't to set CONTAINER PATH) ``` ctr run...
Add `matchConditions` in webhook when rules is pod resources create operation, to validate `object.spec.schedulerName` whether is default volcano scheduler name `volcano`, current we don't consider scheduler name change. /kind feature...
### Description When volcano-admission pod crash, It will affect me creating other pods. ### Steps to reproduce the issue 1. install volcano use helm install 2. scale volcano-admission replicas to...
Fixes: https://github.com/NVIDIA/gpu-operator/issues/642
When i use preinstalled drivers are on host. other components may fail because user did not disable nouveau driver.
When my node having install nvidia driver, and my install gpu-operator disable driver install, but found node lable `nvidia.com/gpu.deploy.driver` is `true`, is not `pre-installed`. then enable driver install, node lable...