DCGM icon indicating copy to clipboard operation
DCGM copied to clipboard

Removal of dependencies on cuda v10

Open mamccorm opened this issue 10 months ago • 7 comments

Hi there,

cuda v10 has been EOL for some time, however it appears there are still several references to it in various places in the project.

Are there any plans to remove the cuda v10 dependencies in the pipeline?

mamccorm avatar Apr 11 '24 19:04 mamccorm

@mamccorm yes there is. The next release of DCGM will remove all references to CUDA v10. Thanks!

bmarchant avatar Apr 11 '24 22:04 bmarchant

Appreciate the reply @bmarchant, that's great to hear. Any ballpark timelines for when the next release may be due to land?

mamccorm avatar Apr 11 '24 22:04 mamccorm

@mamccorm We are trying to get it released in the next week or two.

bmarchant avatar Apr 12 '24 19:04 bmarchant

@mamccorm

I need to correct @bmarchant. The next major DCGM release will remove Cuda10, which is planned for later this year. However, the upcoming release in the DCGM 3.x branch will still provide Cuda10 plugins.

The DCGM policy is to support three major Cuda versions in each release.

Could you provide more details on the issues that you are facing with Cuda10?

nikkon-dev avatar Apr 15 '24 00:04 nikkon-dev

Thanks @nikkon-dev for the follow-up, and much appreciated.

We're keen to leverage DCGM without a dependency on EOL software. Looks like CUDA v10 was added to the end-of-life section of GitLab ~9 months ago, and the binaries are also not redistributed anymore here.

Also cross-referencing some info from this doc:

  • https://docs.nvidia.com/deploy/cuda-compatibility/index.html
All CUDA releases supported through the lifetime of the datacenter driver branch. For example, R418 (CUDA 10.1) EOLs in March 2022
Branches R525, R515, R510, R465, R460, R455, R450, R440, R418, R410, R396, R390 are end of life and are not supported targets for compatibility

mamccorm avatar Apr 22 '24 09:04 mamccorm

Hi @mamccorm,

I just wanted to clarify that the DCGM package doesn't rely on any Cuda packages. All the necessary components are linked or provided by the DCGM package itself and loaded at runtime based on the detected driver. It's important to note that the DCGM 3.x branch has been supporting drivers since R418, and even if Cuda10 is EOL, we cannot remove it from the package.

nikkon-dev avatar Apr 22 '24 20:04 nikkon-dev

Thanks @nikkon-dev. So no cuda10 runtime dep, but the build process requires cuda10/11/12 to enable building of each plugin from source for their respective cuda versions?

The redist packages for cuda10 are no longer published, and outside of debian/ubuntu, there may be no existing pre-build cuda10 package to pull (which is issue we're facing when theres a buildtime dep in DCGM).

In any event this has been helpful and look forward to future releases

mamccorm avatar Apr 23 '24 11:04 mamccorm

Cuda10 was removed from the OSS builds.

nikkon-dev avatar Sep 09 '24 18:09 nikkon-dev