Evan Lezar comments

Results 419 comments of


                                            Evan Lezar

Change merge strategy to preserve Plugins when importing Configs

@rayburgemeestre since cri-o already supports drop-in files for overriding the config, does following a similar mechanism here work too? See https://github.com/cri-o/cri-o/blob/e0e17ee187c9f52d870b80cee9116c4fd5ca279e/pkg/config/config.go#L699 I haven't dug too much into whether only leaves...

Improve e2e tests

@ArangoGutierrez this was merged as part of your e2e testing changes, correct?

K8s 1.24 failed to schedule using GPU-(error code CUDA driver

@luhong123 could you please confirm your device plugin and NVIDIA Container Toolkit versions?

Use mps on kubernetes

This is something that is under active development. We don't have a concrete release date yet, but are targetting the first quarter of 2024.

Use mps on kubernetes

> @igorgad you do not need to manually mount `/dev/shm` in your pod spec. The device-plugin, as part of its AllocateResponse, will make sure all the entities required for MPS...

Use mps on kubernetes

We have an issue to track making the shm size configurable. Would this be able to address your use case? What are typical values for the shared memory size?

Use mps on kubernetes

@ettelr we have an action item to allow the size of the `/dev/shm` that is created to be specified as part of the deployment. Would this work for your use...

How to mount containerPath to a hostPath for discover NVIDIA libraries w/o CDI spec

I have updated #666 to include a fix for this. An additional `hostDevRoot` helm value is added that can be explicitly set to `/` on systems where the root to...

MPS use error: Failed to allocate device vector A (error code all CUDA-capable devices are busy or unavailable)!

Could you try to update your workload to use the following container instead: ``` nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1 ``` Also, is the `nvidia` runtime configured as your default runtime, or are you using...

MPS use error: Failed to allocate device vector A (error code all CUDA-capable devices are busy or unavailable)!

> I found that it is possible to run the mps program directly on the host, but in the container it will prompt that `device(s) is/are busy or unavailable` Could...