ollama icon indicating copy to clipboard operation
ollama copied to clipboard

Irritating log output "libnvidia-ml.so.545.29.06 ... wrong ELF class: ELFCLASS32"

Open shoffmeister opened this issue 2 years ago • 3 comments

When starting ollama, irritating log output is emitted complaining about wrong ELF class: ELFCLASS32- full content below.

I suspect that eventually the working copy of libnvidia-ml is found, but that does not appear it the logs.

As such, this is a very irritating.

I'd suggest emitting a Successfully loaded CUDA management library /usr/lib64/libnvidia-ml.so.545.29.06 to the logs to balance out the earlier problem entry.

2024/01/27 07:32:28 gpu.go:282: INFO Discovered GPU libraries: [/usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]
2024/01/27 07:32:28 gpu.go:294: INFO Unable to load CUDA management library /usr/lib/libnvidia-ml.so.545.29.06: Unable to load /usr/lib/libnvidia-ml.so.545.29.06 library to query for Nvidia GPUs: /usr/lib/libnvidia-ml.so.545.29.06: wrong ELF class: ELFCLASS32
2024/01/27 07:32:28 gpu.go:99: INFO Nvidia GPU detected

shoffmeister avatar Jan 27 '24 06:01 shoffmeister

@shoffmeister and @dhiltgen I guess you have this issue on a Linux machine eventually in AWS. I was confronted the same and figured out by running some online research that it most probably is an issue by Ollama using the 32bit version of nvidia libraries on a 64bit system. I am not sure if this affects actually the usage of GPUs but I had the experience of slow response when having these logs.

My current workaround is that I replace the 32bit library with the 64bit library and I do not get any of these logs messages anymore. I also did not experience any slow responses since then allthough I can not be sure it is because of that, since these were not isolated experiences or tests.

I would suggest to implement in Ollama Linux variant that the 64bit libraries are being used as a priority as soon as they are available.

NanisTe avatar Feb 05 '24 09:02 NanisTe

As mentioned, I don't think it is a functional issue.

This smells as if something is scanning the library path (in absolutely the right order) for matching libraries and probes things. For me, the 32bit libraries are hit first, hit the with diagnostic, then the 64 bit library is hit, and things work.

I haven't taken a look at any of the code to support my gut feeling, though.

Do de-irritate, all it takes would be a "Successfully ... " - and as I don't know whether ollama itself does the scanning, I don't know whether this is actually actionable on the ollama side.

This is happening on my local Fedora Linux, FWIW,

shoffmeister avatar Feb 05 '24 13:02 shoffmeister

We have some other PRs in flight that may transition us off of nvidia-ml and over to the cudart libraries instead. If those work out, the code in question that's generating this warning will be removed.

dhiltgen avatar Feb 05 '24 23:02 dhiltgen

We've merged the change that switches to leveraging the cudart library first, and only if that doesn't work out, falls back to the nvidia management library. The longer-term goal is remove the management library dependency entirely, but we'll wait a few releases to make sure the cudart approach doesn't have any corner cases we missed. I think we can consider this issue resolved in 0.1.30.

dhiltgen avatar Mar 27 '24 19:03 dhiltgen