rongfu.leng

Results 92 issues of rongfu.leng

1. having two nodes, a node having GPU, another node not GPU. 2. apply a use nvidia.com.gpu resources pod yaml, some time pod can schedule to no GPU node.

issue/stale

https://github.com/NVIDIA/k8s-device-plugin?tab=readme-ov-file#with-cuda-mps

issue: https://github.com/containerd/containerd/issues/5046 Usage: UNIX: ``` ctr run --device= ctr run --device=: ctr run --device=:: ``` Window: (Support cli to set HOST PATH, don't to set CONTAINER PATH) ``` ctr run...

needs-ok-to-test
size/S

Add `matchConditions` in webhook when rules is pod resources create operation, to validate `object.spec.schedulerName` whether is default volcano scheduler name `volcano`, current we don't consider scheduler name change. /kind feature...

kind/feature
ok-to-test
size/S

### Description When volcano-admission pod crash, It will affect me creating other pods. ### Steps to reproduce the issue 1. install volcano use helm install 2. scale volcano-admission replicas to...

help wanted
good first issue
kind/bug

Fixes: https://github.com/NVIDIA/gpu-operator/issues/642

When i use preinstalled drivers are on host. other components may fail because user did not disable nouveau driver.

feature

When my node having install nvidia driver, and my install gpu-operator disable driver install, but found node lable `nvidia.com/gpu.deploy.driver` is `true`, is not `pre-installed`. then enable driver install, node lable...