Docker container for version 2.3.0 CUDA detection broken

Open JoeGonzalez0886 opened this issue 1 year ago • 1 comments

System Info

Running this container on multiples services produces an issue with cuda gpu detection. No gpus are detected.

Running LLama 3.1 from HF -Tried on Runpod/Local/Novita platforms. -GPUs tested RTX 4090, A4500.

Reverting back to container tagged version :2.2.0 Fixes the issue.

Just though I would post this up just in case others are usuing 2.3.0 in production, we had a automated scaling process instantiate the new container with :latest tagged and it brought down our production systems.

Please take a look this issue team.

Thank you.

Information

[X] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

Pull latest 2.3.0 docker images
Run with any LLM.
Will faill to find GPU

Expected behavior

We would expect this version to automatically detect local GPU cuda.

Sep 20 '24 23:09 JoeGonzalez0886

We ran into the same issue yesterday with our docker launching scripts using latest image tag. Looks like latest is pointing to 2.3.0-rocm tag instead of 2.3.0.

Using version based tag addressed the issue

Sep 21 '24 18:09 antonpolishko