[BUG] NVIDIA yaml container needs CUDA 12.8 but docs says CUDA 12.2+ or newer
Describe the bug Cannot deploy NVIDIA yaml because CUDA version is 12.4. According to your docs it should work with 12.2 or newer?
To Reproduce Steps to reproduce the behavior: 0. User Nvidia yaml
- Install on Truenas scale
- see error
Expected behavior Deploy container
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
- OS: Truenas scale
- Deployment: Docker
- AudioMuse-AI Version: Latest NVIDIA
- Jellyfin/Navidrome Version: [e.g. Navidrome 0.57.0]
Additional context
unsatisfied condition: cuda>=12.8, please update your driver to a newer version, or use an earlier cuda container: unknown\n [2025/12/01 08:09:52] (ERROR) app_lifecycle.compose_action():56 - Failed 'up' action for 'audiomuse' app: Network ix-audiomuse_default Creating\n Network ix-audiomuse_default Created\n Container audiomuse-postgres Creating\n Container audiomuse-redis Creating\n Container audiomuse-postgres Created\n Container audiomuse-redis Created\n Container audiomuse-ai-worker-instance Creating\n Container audiomuse-ai-flask-app Creating\n Container audiomuse-ai-flask-app Created\n Container audiomuse-ai-worker-instance Created\n Container audiomuse-redis Starting\n Container audiomuse-postgres Starting\n Container audiomuse-postgres Started\n Container audiomuse-redis Started\n Container audiomuse-ai-flask-app Starting\n Container audiomuse-ai-worker-instance Starting\nError response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'\nnvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.8, please update your driver to a newer version, or use an earlier cuda container: unknown\n [2025/12/01 09:26:45] (ERROR) app_lifecycle.compose_action():56 - Failed 'up' action for 'audiomuse' app: Network ix-audiomuse_default Creating\n Network ix-audiomuse_default Created\n Container audiomuse-redis Creating\n Container audiomuse-postgres Creating\n Container audiomuse-postgres Created\n Container audiomuse-redis Created\n Container audiomuse-ai-worker-instance Creating\n Container audiomuse-ai-flask-app Creating\n Container audiomuse-ai-flask-app Created\n Container audiomuse-ai-worker-instance Created\n Container audiomuse-redis Starting\n Container audiomuse-postgres Starting\n Container audiomuse-redis Started\n Container audiomuse-postgres Started\n Container audiomuse-ai-worker-instance Starting\n Container audiomuse-ai-flask-app Starting\nError response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #1: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'\nnvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.8, please update your driver to a newer version, or use an earlier cuda container: unknown\n [2025/12/01 09:28:42] (ERROR) app_lifecycle.compose_action():56 - Failed 'up' action for 'audiomuse' app: Network ix-audiomuse_default Creating\n Network ix-audiomuse_default Created\n Container audiomuse-postgres Creating\n Container audiomuse-redis Creating\n Container audiomuse-postgres Created\n Container audiomuse-redis Created\n Container audiomuse-ai-worker-instance Creating\n Container audiomuse-ai-flask-app Creating\n Container audiomuse-ai-flask-app Created\n Container audiomuse-ai-worker-instance Created\n Container audiomuse-redis Starting\n Container audiomuse-postgres Starting\n Container audiomuse-redis Started\n Container audiomuse-postgres Started\n Container audiomuse-ai-worker-instance Starting\n Container audiomuse-ai-flask-app Starting\n Container audiomuse-ai-flask-app Started\nError response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #1: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'\nnvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.8, please update your driver to a newer version, or use an earlier cuda container: unknown\n root@truenas[/home/admin]#
Hi, I defintly need to update the documentation, the -nvidia container image is built with CUDA 12.8.1 so this means that cuda>=12.8.1 is needed on the host.
Hi, I defintly need to update the documentation, the -nvidia container image is built with CUDA 12.8.1 so this means that cuda>=12.8.1 is needed on the host.
That's what I thought also, unfortunately, that means it's not possible to run the container on latest TrueNas scale until, or, rather if the devs of TrueNas update the NVIDIA drivers with the next stable release.
Is it out of scope for you to create a container that supports earlier CUDA versions? I would like to use my RTX3050 for the clustering process. :)
Ok, I had some outstanding OS updates which I have applied and Im now running CUDA 12.8 and launched the container successfully. 😀
Currently running some testing and not really seeing the results I had expected. I am using an RTX3050 6GB card (Not too powerful I know), and its powered via PCI only.
With a library of approx 30K songs, the clustering has been running for 1h 25mins and its at 69%. That seems rather slow compared to standard CPU clustering?
Interestingly when I monitor the GPU usage from the container, the Memory-usage never exceeds 200mb while GPU cores go above 90%. Is that normal behavior? I have a feeling the clustering falls back to CPU at times. The USE_GPU_CLUSTERING is set to yes.
I also have not changed anything in the clustering parameters:
- Top Playlist number = 8
- Clustering runs = 5000
- K-Means specific Min = 40
- K-Means specific Max = 100
Thanks for your feedback. I'm not an high expert of GPU implementation, on this maybe @rendyhd can tell us more. Anyway from my understanding using GPU introduce the additional work to copy stuff bfrom CPU to GPU for the execution of the code. And in clustering you run several (5000 by default) small task with different data that need to be copied.
I don't have mathematical test at support but I tested on my 4050 with 6gb of ram and PCIe 4.0. If you're testing on a PCIe 3.0 for example is totally possible that the time gain from executing on GPU is less than the time wasted in copy data from CPU and GPU.
Thanks for your reply.
Further debugging reveals issue with my PCI link to GPU and Im running PCI 1.0 speeds :)
LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
Ill debug further and let you know if anything changes.
Thanks again for you continued hard work. I love Aduiomuse!!
So my SATA PCI card was causing the card to downgrade. Unfortunately, after removing the card, the GPU did run at full x16 but audiomuse still only used about 200mb of memory from the card. I would appreciate if @rendyhd could assist.
Hi @Th3K1ngP1n , if you have 90% GPU usage that's a good thing. You can always double check if GPU is Working in the worker logs, there should be messages like, GPU acceleration is available for clustering (RAPIDS cuML detected), and I believe also when it completes cycles.
If you don't notice a speed in difference between CPU and GPU while GPU is working, I'm currently thinking of two options:
- you have a very strong CPU and don't notice a big difference? I don't have a lot of experience with that.
- I wonder if you have other bottlenecks, like IO, database queries, or overhead jobs. Do you see any throttling happening anywhere across the board on anything else?
How big is the difference now for 1000 or 5000 between CPU and GPU exactly? You're LnkSta is 16GT/s now?
For me, between a shared intel i5 and dedicated 4070ti super, i noticed an 11x difference in speed.
Hi @rendyhd and thanks for your reply.
I have monitored the logs and the GPU is certainly being used.
I have a pretty standard CPU, its a Ryzen 5 3600. At the moment I cannot say for certain how long the clustering took on CPU only, I think it was also around the 2 hour mark. To get these exact details I would need to revert back to the default container and run the job again which I can do tomorrow.
I don't see any other issues on my host. Im running everything on TrueNas Scale and the GPU is allocated to some other containers, Emby, Immich for example, but, they are not using it during clustering, or using it at all for a matter of fact, only vary rarely. It doesn't seem like there could be any other bottlenecks other than the allocated lanes, which leads me to...
My link is back to x8 now because my system requires the other PCI slot to be occupied with a SATA card. I did take the sata card out momentarily earlier and ran the clustering with the GPU using x16 lanes, it appeared to be running at the same speed as x8.
I feel, even with x8 lanes the GPU clustering should be quicker than the CPU right?
Hi again, I let the clustering run over night on CPU, it did take a little longer. My final results same configuration as mentioned above for both tasks are:
CPU (Ryzen 5 3600) 00 : 03 : 44
GPU (RTX 3050 6GB) 00 : 02 : 00 (actual value was under 2 hours by a few minutes)
What do you think @rendyhd ? Are those times to be expected with such a GPU?
I am happy to test further if you can point me in the right direction.