Cortex support for AMD GPUs
- [X] I have searched the existing issues
Background
Currently, Jan only supports NVIDIA GPUs for acceleration. Users with AMD GPUs, particularly those using eGPUs on Intel Macs, are unable to utilize their graphics hardware for acceleration in Jan. This limits the performance potential for a segment of Jan users who have invested in AMD GPU solutions.
Feature request
- [ ] https://github.com/janhq/cortex/issues/323
- [ ] https://github.com/janhq/jan/issues/2587
- [ ] https://github.com/janhq/docs/issues/15
- [x] https://github.com/janhq/jan/issues/3394
- [ ] https://github.com/janhq/jan/issues/3530
- [ ] https://github.com/janhq/jan/issues/4375
Proposed Implementation
For engineers to fill in
Additional Notes
Consider prioritizing support for popular AMD GPU models like the Vega series initially.
Commenting in support of this feature request
please support gpu amd discreet on macbook pro 2017-2019
Not able to use Jan with amd GPUs on Linux? Sad...
Not able to use Jan with amd GPUs on Linux? Sad...
You can try over Vulkan, but it's still was slow. R7 2700 faster that RX6600 over Vulkan (Using Qwen 2.5 32B Q4 model)
AMD support is a great idea.
Hopefully ROCM support can be supported. But those who don't know, you can enabled the admittedly slower Vulkan support.
Go into settings - > Advanced settings and enable "experimental mode" Then ensure that linux-amd64-vulkan is selected under the Local engine settings tab, if not just select it. And you should be running with GPU acceleration, though in my testing (I used LM Studio for ROCM on linux) it seems that the Vulkan version is slightly slower than ROCM. So it'd be nice for ROCM to actually be officially supported in Jan.
Anyways hope this helps :) And appreciate all the hard work the contributors do on this project. So please consider prioritizing this, as a lot of people are using AMD now and days.
So, I was interested to see if my system would even benefit from ROCm support. When benchmarking llama.cpp on Linux using an RX 6800 XT, interestingly Vulkan is performing better than ROCm (at least for the output generation, which is the important part):
Vulkan:
llama-bench -m .config/Jan/data/models/huggingface.co/bartowski/deepcogito_cogito-v1-preview-qwen-14B-GGUF/deepcogito_cogito-v1-preview-qwen-14B-Q6_K_L.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none
| model | size | params | backend | ngl | test | t/s |
|---|---|---|---|---|---|---|
| qwen2 14B Q6_K | 11.63 GiB | 14.77 B | Vulkan | 99 | pp512 | 394.77 ± 0.28 |
| qwen2 14B Q6_K | 11.63 GiB | 14.77 B | Vulkan | 99 | tg128 | 36.82 ± 0.09 |
ROCm:
llama-bench -m .config/Jan/data/models/huggingface.co/bartowski/deepcogito_cogito-v1-preview-qwen-14B-GGUF/deepcogito_cogito-v1-preview-qwen-14B-Q6_K_L.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 6800 XT, gfx1030 (0x1030), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | test | t/s |
|---|---|---|---|---|---|---|
| qwen2 14B Q6_K | 11.63 GiB | 14.77 B | ROCm | 99 | pp512 | 633.67 ± 0.57 |
| qwen2 14B Q6_K | 11.63 GiB | 14.77 B | ROCm | 99 | tg128 | 31.97 ± 0.02 |
llama.cpp version: 5186 OS: Linux (NixOS) - Kernel 6.14.3 GPU: 1x RX 6800 XT Vulkan Driver: RADV (Mesa 25.0.4) ROCm: 6.3.3
On the other hand I found a report where ROCm was outperforming Vulkan.
The discrepancy could be due to different GPU models/generations or different operating system or the Vulkan implementation got massively improved since version 3818, or something else.
To find out there would be more such benchmark comparisons needed.
Also, acceleration for AMD GPUs works perfectly fine in Jan on Linux when the linux-amd64-vulkan engine backend is selected in the settings, and if the documentation is correct it's also supported on Windows. I propose to change the title and description of this issue since the description is incorrect right now.
I just noticed that the names of the backends in the docs don't match the ones in Jan itself and the "Other Accelerators" header should be called "AMD GPU support" to match NVIDIA. That might be another reason for why some people asked for general AMD GPU acceleration support here.