Irritating log output "libnvidia-ml.so.545.29.06 ... wrong ELF class: ELFCLASS32"
When starting ollama, irritating log output is emitted complaining about wrong ELF class: ELFCLASS32- full content below.
I suspect that eventually the working copy of libnvidia-ml is found, but that does not appear it the logs.
As such, this is a very irritating.
I'd suggest emitting a Successfully loaded CUDA management library /usr/lib64/libnvidia-ml.so.545.29.06 to the logs to balance out the earlier problem entry.
2024/01/27 07:32:28 gpu.go:282: INFO Discovered GPU libraries: [/usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]
2024/01/27 07:32:28 gpu.go:294: INFO Unable to load CUDA management library /usr/lib/libnvidia-ml.so.545.29.06: Unable to load /usr/lib/libnvidia-ml.so.545.29.06 library to query for Nvidia GPUs: /usr/lib/libnvidia-ml.so.545.29.06: wrong ELF class: ELFCLASS32
2024/01/27 07:32:28 gpu.go:99: INFO Nvidia GPU detected
@shoffmeister and @dhiltgen I guess you have this issue on a Linux machine eventually in AWS. I was confronted the same and figured out by running some online research that it most probably is an issue by Ollama using the 32bit version of nvidia libraries on a 64bit system. I am not sure if this affects actually the usage of GPUs but I had the experience of slow response when having these logs.
My current workaround is that I replace the 32bit library with the 64bit library and I do not get any of these logs messages anymore. I also did not experience any slow responses since then allthough I can not be sure it is because of that, since these were not isolated experiences or tests.
I would suggest to implement in Ollama Linux variant that the 64bit libraries are being used as a priority as soon as they are available.
As mentioned, I don't think it is a functional issue.
This smells as if something is scanning the library path (in absolutely the right order) for matching libraries and probes things. For me, the 32bit libraries are hit first, hit the with diagnostic, then the 64 bit library is hit, and things work.
I haven't taken a look at any of the code to support my gut feeling, though.
Do de-irritate, all it takes would be a "Successfully ...
This is happening on my local Fedora Linux, FWIW,
We have some other PRs in flight that may transition us off of nvidia-ml and over to the cudart libraries instead. If those work out, the code in question that's generating this warning will be removed.
We've merged the change that switches to leveraging the cudart library first, and only if that doesn't work out, falls back to the nvidia management library. The longer-term goal is remove the management library dependency entirely, but we'll wait a few releases to make sure the cudart approach doesn't have any corner cases we missed. I think we can consider this issue resolved in 0.1.30.