Alpaca icon indicating copy to clipboard operation
Alpaca copied to clipboard

NVIDIA GPU Not Detected

Open ncorder opened this issue 7 months ago • 10 comments

Hello everyone, First and foremost, my sincerest apologies if this is a know error or if this is posted anywhere. I searched around online but failed to find a remedy and wanted to post here in case there is a larger issue,.

Image

The issue is that I have an NVIDIA GPU, but the software does not appear to detect it. It instead claims I have an AMD GPU:

Here is my GPU information being printed:

Image

Thank you so much for all of your time helping make this software available. If there is anything else I can do to help resolve this issue, please do let me know.

—Best of wishes


$ flatpak run com.jeffser.Alpaca
/app/lib/python3.12/site-packages/pydbus/registration.py:130: DeprecationWarning: Gio.DBusConnection.register_object is deprecated
  ids = [bus.con.register_object(path, interface, wrapper.call_method, None, None) for interface in interfaces]

ncorder avatar Apr 22 '25 03:04 ncorder

Looks like it is an issue related to Ollama itself but not Alpaca.

Could you share your output about nvidia-smi

And please share more information: Which distribution you are using? Which nvidia driver installed, you should install nvidia-open but not nvidia. Have you checked the cuda related package installed?

8ar10der avatar Apr 24 '25 10:04 8ar10der

Are you running an AMD CPU with iGPU?

mags0ft avatar Apr 25 '25 12:04 mags0ft

To get Alpaca to use my 3080 I had to go into the ollama instance settings and set the option "CUDA_VISIBLE_DEVICES" to 0. That value may change if you have multiple gpus.

umaxtu avatar Apr 30 '25 12:04 umaxtu

Any news on this, @ncorder?

mags0ft avatar May 01 '25 21:05 mags0ft

Looks like it is an issue related to Ollama itself but not Alpaca.

Could you share your output about nvidia-smi

And please share more information: Which distribution you are using? Which nvidia driver installed, you should install nvidia-open but not nvidia. Have you checked the cuda related package installed?

Sorry for my delay in response! Here is my output from nvidia-smi

Thu May  1 17:12:59 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5070 Ti     Off |   00000000:01:00.0  On |                  N/A |
| 33%   33C    P3             44W /  300W |    2551MiB /  16303MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2094      G   /usr/lib/xorg/Xorg                      338MiB |
|    0   N/A  N/A            3560      G   firefox                                 493MiB |
|    0   N/A  N/A           46191      G   /usr/bin/cheese                        1655MiB |
+-----------------------------------------------------------------------------------------+

ncorder avatar May 02 '25 00:05 ncorder

Are you running an AMD CPU with iGPU?

Yes! My apologies for not stating this earlier. This machine is a System76 Thelio Mira and I believe does come with built in AMD graphics—that makes much more sense regarding the error.

ncorder avatar May 02 '25 00:05 ncorder

...with built in AMD graphics

Then, it is worth trying to set the CUDA_VISIBLE_DEVICES in the instance settings, like what @umaxtu mentioned above .

If you are not using built-in ollama, check this: https://github.com/ollama/ollama/issues/1813

8ar10der avatar May 02 '25 00:05 8ar10der

...with built in AMD graphics

Then, it is worth trying to set the CUDA_VISIBLE_DEVICES in the instance settings, like what @umaxtu mentioned above .

If you are not using built-in ollama, check this: ollama/ollama#1813

Yes, my apologies I attempted to override + reboot but the error persists:

Image Image

Thank you all so much for the help

ncorder avatar May 02 '25 01:05 ncorder

Hmm, this is kind of weird. I don't really see something that's an obvious clue right there.

Can you please navigate into your Settings > Info about your Device > Graphics and show what's written out there? If it contains, for example, an artist's name in capital letters (don't ask me why AMD does it like that 😭), that may be an indication your drivers didn't load correctly in the first place.

At least that's what has kept happening to me.

Another problem I had that may lead to this occuring is putting the PC to sleep, waking it up again and then trying to use the same session of Ollama. You'd need to stop and restart the Ollama process, which would fix the issue for me.

If you have a non-integrated installation of Ollama to also try stuff out on,

sudo systemctl restart ollama

may do the trick. Otherwise, closing and opening Alpaca (with "Run Alpaca in the background" disabled, of course!) may change something, but I'm sure you already tried that.

It's hard to pinpoint anything clear.

mags0ft avatar May 02 '25 09:05 mags0ft

I have also been facing this issue for a good few months now actually. I have a dedicated Nvidia GPU (no iGPU). Alpaca's Ollama instance doesn't pick up on the Nvidia GPU and thus uses CPU for computation. It says "no compatible GPUs were discovered" while running. I tried setting CUDA_VISIBLE_DEVICES to 0 as well which is the correct number, didn't work even after restarting. Alpaca is now at 6.0 but the issue still persists.

If I install Ollama on my system though and try a model on CLI, that picks up the GPU automatically without any manual configuration.

Would be happy to help debug this, let me know if any info is required.

OS: Fedora 42 (Gnome) Linux Kernel: 6.14 Nvidia Driver: 570.153.02 (Proprietary)

Image

Image

adarsh1001 avatar May 25 '25 11:05 adarsh1001

Hello everyone, First and foremost, my sincerest apologies if this is a know error or if this is posted anywhere. I searched around online but failed to find a remedy and wanted to post here in case there is a larger issue,.

Image

The issue is that I have an NVIDIA GPU, but the software does not appear to detect it. It instead claims I have an AMD GPU:

Here is my GPU information being printed:

Image

Thank you so much for all of your time helping make this software available. If there is anything else I can do to help resolve this issue, please do let me know.

—Best of wishes

$ flatpak run com.jeffser.Alpaca
/app/lib/python3.12/site-packages/pydbus/registration.py:130: DeprecationWarning: Gio.DBusConnection.register_object is deprecated
  ids = [bus.con.register_object(path, interface, wrapper.call_method, None, None) for interface in interfaces]

Have you tried to input the UUID itself in CUDA_VISIBLE_DEVICES? This is how I am running it (see screenshot).

Image

b0ff3n avatar Jun 09 '25 18:06 b0ff3n

Hi! It seems I now have the same issue: Fedora 42, kernel 6.15.3, nvidia rtx 4070, driver 575.64, alpaca 6.1.7 Aplaca with managed ollama used to work OK and used dgpu, but this week it stopped doing it. I tried recreating ollama managed instance, but to no effect.

Tried different override configs, but to no effect. Image

Image

Image

upd. Just updated to 7.0 and the same issue persists:

Image

Example log: time=2025-06-28T11:55:40.421+03:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-06-28T11:55:42.193+03:00 level=WARN source=cuda_common.go:65 msg="old CUDA driver detected - please upgrade to a newer driver" version=0.0 time=2025-06-28T11:55:42.193+03:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-06-28T11:55:42.193+03:00 level=INFO source=amd_linux.go:332 msg="filtering out device per user request" id=0 visible_devices=[-1] time=2025-06-28T11:55:42.193+03:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected" time=2025-06-28T11:55:42.260+03:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-45152f82-7a64-6aec-4425-e69f5be27103 library=cuda variant=v11 compute=8.9 driver=0.0 name="" total="7.6 GiB" available="7.5 GiB"

StoneToken avatar Jun 28 '25 08:06 StoneToken

It might be problem with ollama 0.9.3, see https://github.com/ollama/ollama/issues/11220#issuecomment-3015174660

StoneToken avatar Jun 28 '25 10:06 StoneToken

It might. I also have this problem now after updating to 7.0 / 0.9.3 (flatpak).

b0ff3n avatar Jun 28 '25 19:06 b0ff3n

I'm the author of that issue thread (https://github.com/ollama/ollama/issues/11220), it is indeed a 0.9.3 issue. ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES needs to be completely unset or else the CUDA backend is not loaded. I have a PR open (https://github.com/ollama/ollama/pull/11234) to add a debug log and to allow for an empty string, but it can't be -1. Hopefully it will get merged.

jakehlee avatar Jun 30 '25 04:06 jakehlee

So the temporary solution would be to complete reinstall alpaca to have default settings for rocr/hip?

StoneToken avatar Jun 30 '25 04:06 StoneToken

I'm the author of that issue thread (ollama/ollama#11220), it is indeed a 0.9.3 issue. ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES needs to be completely unset or else the CUDA backend is not loaded. I have a PR open (ollama/ollama#11234) to add a debug log and to allow for an empty string, but it can't be -1. Hopefully it will get merged.

Could you provied on how to do this?

So far I tired to run bash inside the Alpaca Sandbox:

flatpak run --branch=stable --arch=x86_64 --command=bash com.jeffser.Alpaca
[📦 com.jeffser.Alpaca concat]$ unset HIP_VISIBLE_DEVICES
[📦 com.jeffser.Alpaca concat]$ unset ROCR_VISIBLE_DEVICES
[📦 com.jeffser.Alpaca concat]$ alpaca

But I still get:

INFO	[main.py | main] Alpaca version: 7.0.1
INFO	[ollama_instances.py | start] Starting Alpaca's Ollama instance...
INFO	[ollama_instances.py | start] Started Alpaca's Ollama instance
Couldn't find '/home/vortexacherontic/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEGp4AyRxsbZZt8QsZvGgFD53cme866HgDeMkAm4O4fY

time=2025-07-06T14:18:34.254+02:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES:GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/vortexacherontic/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
INFO	[ollama_instances.py | start] client version is 0.9.3
time=2025-07-06T14:18:34.255+02:00 level=INFO source=images.go:476 msg="total blobs: 5"
time=2025-07-06T14:18:34.255+02:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
time=2025-07-06T14:18:34.255+02:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11435 (version 0.9.3)"
time=2025-07-06T14:18:34.255+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-06T14:18:34.370+02:00 level=WARN source=cuda_common.go:65 msg="old CUDA driver detected - please upgrade to a newer driver" version=0.0
time=2025-07-06T14:18:34.370+02:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-07-06T14:18:34.371+02:00 level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
time=2025-07-06T14:18:34.371+02:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected"
time=2025-07-06T14:18:34.426+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0 library=cuda variant=v11 compute=8.6 driver=0.0 name="" total="9.7 GiB" available="9.4 GiB"
[GIN] 2025/07/06 - 14:18:34 | 200 |     354.933µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/07/06 - 14:18:34 | 200 |   96.028754ms |       127.0.0.1 | POST     "/api/show"

It still claims old CUDA driver detected - please upgrade to a newer driver" version=0.0

While the driver is

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.169                Driver Version: 570.169        CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        On  |   00000000:01:00.0 Off |                  N/A |
|  0%   41C    P8             17W /  320W |      78MiB /  10240MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

and nvidia-smi -L give: GPU 0: NVIDIA GeForce RTX 3080 (UUID: GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0) which I set for CUDA_VISIBLE_DEVICES:GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0

VortexAcherontic avatar Jul 06 '25 12:07 VortexAcherontic

It still claims old CUDA driver detected - please upgrade to a newer driver" version=0.0

Same issue here. I tried using:

flatpak override --unset-env=ROCR_VISIBLE_DEVICES --unset-env=HIP_VISIBLE_DEVICES com.jeffser.Alpaca && flatpak run com.jeffser.Alpaca

I pass the Cuda deviceQuery sample so I think I have Cuda installed correctly and working. But it doesn't seen to load anything into my VRAM. NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9

indianajonesilm avatar Jul 07 '25 17:07 indianajonesilm

I'll fix it please be patient people

Jeffser avatar Jul 08 '25 00:07 Jeffser

This should fix the problem with empty ROCm related env variables stopping cuda from running.

https://github.com/Jeffser/Alpaca/commit/5e46a7e35f864f5fc38aa369bd882bbb5eeeefa6

Jeffser avatar Jul 09 '25 23:07 Jeffser

It is an Ollama (managed) issue. Just Delete the managed instance (Ollama) From Alpaca Install Ollama using (curl -fsSL https://ollama.com/install.sh | sh) Now start Alpaca and choose Ollama ( 2nd choice/ not the managed one) Configure and connect to the instance ( Default value : http://127.0.0.1:11434) Download a Model using Console OR alpaca

This Worked For me ! Alpaca is now using my GPU

rebhilotfi avatar Jul 12 '25 23:07 rebhilotfi

Alpaca 7.5.0 is working great! It now loads onto the GPU vram and is lightning fast. Thanks Jeffser!

indianajonesilm avatar Jul 20 '25 01:07 indianajonesilm

Alpaca 7.5.0 is working great! It now loads onto the GPU vram and is lightning fast. Thanks Jeffser!

May I ask what you did to get it to work? I'm trying with the managed instance of ollama and no dice (RTX 3090 on Fedora/Bazzite).

b0ff3n avatar Jul 21 '25 01:07 b0ff3n

May I ask what you did to get it to work? I'm trying with the managed instance of ollama and no dice (RTX 3090 on Fedora/Bazzite).

Try the following settings on Alpaca 7.5.0:

Image

ROCR_VISIBLE_DEVICES AND HIP_VISIBLE_DEVICES should be blank

indianajonesilm avatar Jul 21 '25 01:07 indianajonesilm

Try the following settings on Alpaca 7.5.0: Image

ROCR_VISIBLE_DEVICES AND HIP_VISIBLE_DEVICES should be blank

Finally working again! Thank you!!! 😀

b0ff3n avatar Jul 21 '25 01:07 b0ff3n

I'm glad it's working, should I close the issue then?

Jeffser avatar Jul 21 '25 02:07 Jeffser