open-gpu-kernel-modules
open-gpu-kernel-modules copied to clipboard
Fixes for non-standard Arm SoC PCIe integrations
This patchset attempts to address a number of limitations present in commonly available Arm SoCs:
- lack of I/O cache-coherency (bus snooping)
- no support for write-combined MMIO mappings (used for the VRAM BAR)
Tested on RK3588 (has all issues above) and CIX P1 (no issues, SBSA-compliant) with an RTX 3050 8 GB.
Most things I've tried (Steam games, benchmarks, monitoring tools, CUDA) work fine now.
See https://github.com/mariobalanica/arm-pcie-gpu-patches/issues/2 for related discussion and demos of the driver running.
Side note: there's currently no Arm userspace release for driver version 580.105.08, so you'll need to stick with 580.95.05.
@mariobalanica - Thanks for posting the PR. I'm trying to test it on a CM5 with an Nvidia A4000, which is detected via lspci, but I am trying to build from your branch... and not getting the module to load in. Maybe I'm doing something wrong here?
- Flash Pi OS 13 'Trixie' to a new boot drive using Raspberry Pi Imager
- Boot the Pi and run
sudo apt update && sudo apt upgrade -yto make sure you're on the latest versions - Reboot the Pi:
sudo reboot
Driver install:
cd ~/Downloads && git clone --branch non-coherent-arm-fixes https://github.com/mariobalanica/open-gpu-kernel-modules.git
cd open-gpu-kernel-modules
make modules -j$(nproc)
make modules_install -j$(nproc)
Download 580.95.05 aarch64 driver from: https://www.nvidia.com/en-us/drivers/unix/
Install without kernel modules (to not write over the ones we just built):
sudo sh ./NVIDIA-Linux-aarch64-580.95.05.run --no-kernel-modules
After that completes, and I reboot, the module does not load in. And attempting to load it manually:
jgeerling@cm5:~/Downloads/open-gpu-kernel-modules $ sudo modprobe nvidia
modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.12.47+rpt-rpi-2712
jgeerling@cm5:~/Downloads/open-gpu-kernel-modules $ ls /lib/modules/6.12.47+rpt-rpi-2712/kernel/drivers/video/
backlight fbdev nvidia-drm.ko.xz nvidia.ko.xz nvidia-modeset.ko.xz nvidia-peermem.ko.xz nvidia-uvm.ko.xz
Am I doing something wrong here or missing a step?
EDIT: I was not updating the module database: sudo depmod -a
Now I'm getting the module to load, but I still wind up with the classic 'RmInitAdapter failed!' error:
[ 9.268222] NVRM: Chipset not recognized (vendor ID 0x14e4, device ID 0x2712)
[ 9.268230] The NVIDIA GPU driver for AArch64 has not been qualified on this platform
and therefore it is not recommended or intended for use in any production
environment.
[ 10.809142] NVRM: memmgrAllocResources_IMPL: memmgrDeterminePageSize failed, status: 0x40
[ 10.809151] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from memmgrAllocResources(pGpu, pMemoryManager, pAllocRequest, pFbAllocInfo) @ system_mem.c:698
[ 10.809156] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from pRmApi->AllocWithHandle(pRmApi, pChannel->hClient, hDevice, hPhysMem, hClass, &memAllocParams, sizeof(memAllocParams)) @ mem_utils_gm107.c:302
[ 10.810856] NVRM: nvAssertOkFailedNoLog: Assertion failed: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057) returned from kchannelGetNotifierInfo(pGpu, pDevice, pKernelChannel->hErrorContext, &pKernelChannel->pErrContextMemDesc, &pKernelChannel->errorContextType, &pKernelChannel->errorContextOffset) @ kernel_channel.c:526
...
[ 16.833048] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from scrubberConstruct(pGpu, pHeap) @ mem_mgr_scrub_gp100.c:60
[ 16.833052] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from memmgrScrubHandlePostSchedulingEnable_HAL(pGpu, pMemoryManager) @ mem_mgr.c:487
[ 16.833056] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_fifo.c:3129
[ 16.833063] NVRM: RmInitNvDevice: *** Cannot load state into the device
[ 16.833065] NVRM: RmInitAdapter: RmInitNvDevice failed, bailing out of RmInitAdapter
[ 16.835308] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:346
[ 17.057567] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:346
[ 17.057628] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x10100
[ 17.057641] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x25:0x40:1236)
[ 17.058087] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0
[ 10.809142] NVRM: memmgrAllocResources_IMPL: memmgrDeterminePageSize failed, status: 0x40
Is the kernel built with a non-4K page size?
@mariobalanica - Ah yes... this is the Pi default kernel, 16K page size. I can try switching to a 4K kernel.
jgeerling@cm5:~ $ getconf PAGESIZE
16384
jgeerling@cm5:~ $ sudo nano /boot/firmware/config.txt
# Add to bottom
kernel=kernel8.img
jgeerling@cm5:~ $ sudo reboot
...
jgeerling@cm5:~ $ getconf PAGESIZE
4096
@mariobalanica - Okay, building off the 4K kernel gets me a loaded driver:
[ 3.720807] nvidia: loading out-of-tree module taints kernel.
[ 3.731762] nvidia-nvlink: Nvlink Core is being initialized, major device number 507
[ 3.739304] nvidia 0001:01:00.0: enabling device (0000 -> 0002)
[ 3.739445] nvidia 0001:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 3.838549] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for aarch64 580.95.05 Release Build (jgeerling@cm5) Wed 26 Nov 16:10:28 CST 2025
[ 3.845587] [drm] [nvidia-drm] [GPU ID 0x00010100] Loading driver
[ 3.845696] [drm] Initialized nvidia-drm 0.0.0 for 0001:01:00.0 on minor 2
environment.
However, displays plugged into the DisplayPort plugs don't seem to get any signal.
But nvidia-smi does work:
jgeerling@cm5:~ $ nvidia-smi
Wed Nov 26 16:16:19 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A4000 Off | 00000001:01:00.0 Off | Off |
| 41% 41C P8 7W / 140W | 1MiB / 16376MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Vulkan info:
$ DISPLAY=:0 vulkaninfo --summary
Vulkan Instance Version: 1.4.309
Devices:
========
GPU0:
apiVersion = 1.4.312
driverVersion = 580.95.5.0
vendorID = 0x10de
deviceID = 0x24b0
deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
deviceName = NVIDIA RTX A4000
driverID = DRIVER_ID_NVIDIA_PROPRIETARY
driverName = NVIDIA
driverInfo = 580.95.05
conformanceVersion = 1.4.1.3
deviceUUID = 8571e0b8-bbc4-4e4d-ccb9-a8100b67850a
driverUUID = b92269a1-b525-5615-ab8a-e2095ee37192
I tried compiling llama.cpp with CUDA, but got:
-- Could not find nvcc, please set CUDAToolkit_ROOT.
CMake Error at ggml/src/ggml-cuda/CMakeLists.txt:190 (message):
CUDA Toolkit not found
So I switched to Vulkan to see if acceleration is working:
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA RTX A4000 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = V3D 7.1.10.2 (V3DV Mesa) | uma: 1 | fp16: 0 | bf16: 0 | warp size: 16 | shared memory: 16384 | int dot: 0 | matrix cores: none
And it's definitely accelerated... NICE! More info here https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/692#issuecomment-3583473917
Do you know if there are any other tricks to getting display output? On other cards where Mesa wasn't happy but the drivers worked, I would at least get output with a flashing cursor, and I could press Alt + F2 to get to console. Here I'm not even seeing that...
Not sure, I didn't have to do anything special to get HDMI out. What does drm_info say?
@mariobalanica -
jgeerling@cm5:~ $ ls -lah /dev/dri
total 0
drwxr-xr-x 3 root root 160 Nov 27 09:57 .
drwxr-xr-x 17 root root 4.5K Nov 27 09:57 ..
drwxr-xr-x 2 root root 140 Nov 27 09:57 by-path
crw-rw----+ 1 root video 226, 0 Nov 27 09:57 card0
crw-rw----+ 1 root video 226, 1 Nov 27 09:57 card1
crw-rw----+ 1 root video 226, 2 Nov 27 09:57 card2
crw-rw----+ 1 root render 226, 128 Nov 27 09:57 renderD128
crw-rw----+ 1 root render 226, 129 Nov 27 09:57 renderD129
jgeerling@cm5:~ $ drm_info
drmModeGetResources: Operation not supported
Failed to retrieve information from /dev/dri/card2
drmModeGetResources: Operation not supported
Failed to retrieve information from /dev/dri/card0
Node: /dev/dri/card1
├───Driver: vc4 (Broadcom VC4 graphics) version 0.0.0 (0)
...
From dmesg during boot:
[ 3.908829] [drm] [nvidia-drm] [GPU ID 0x00010100] Loading driver
[ 3.908970] [drm] Initialized nvidia-drm 0.0.0 for 0001:01:00.0 on minor 2
jgeerling@cm5:~ $ lsmod | grep nvidia
nvidia_uvm 1622016 0
nvidia_drm 122880 2
nvidia_modeset 1912832 1 nvidia_drm
nvidia 14671872 7 nvidia_uvm,nvidia_modeset
drm_ttm_helper 16384 1 nvidia_drm
drm_kms_helper 229376 5 drm_dma_helper,vc4,drm_shmem_helper,drm_ttm_helper,nvidia_drm
drm 675840 21 gpu_sched,drm_kms_helper,drm_dma_helper,v3d,vc4,drm_shmem_helper,drm_display_helper,nvidia,drm_ttm_helper,nvidia_drm,ttm
backlight 24576 3 drm_kms_helper,drm,nvidia_modeset
Any chance to support 1070 Ti?
I'm fairly certain the 1070 doesn't have the GSP firmware required to work with the open-gpu-kernel-modules — see, similarly, the 750 Ti: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/26#issuecomment-3586566167
Thanks!
I'm fairly certain the 1070 doesn't have the GSP firmware required to work with the open-gpu-kernel-modules — see, similarly, the 750 Ti: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/26#issuecomment-3586566167
So do we have to peg to 4k to use this? Does anyone have an estimate on the work to port the driver to 16k, or is this about the firmware (I cant imagine why CPU page size would affect driver)
I've now tested an RTX A4000, 750 Ti, and 3080 Ti on a Raspberry Pi CM5 running Debian Trixie. The 750 Ti (referenced above) doesn't have GSP firmware so can't work with the open driver.
For the other two, the behavior was the same:
- No errors in dmesg, no indication of anything not working in
nvidia-smi - Works great for llama.cpp and GPU compute purposes
- No output through DisplayPort on any plug (or on 3080 Ti's HDMI port), along with no indication
nvidia-drmis working/not working that I can see...
I tested everything on both cards on a separate Intel x86 system on my bench, and both HDMI and DisplayPort outputs worked fine on that setup (running Ubuntu 25.10, using the same version of the driver that I've installed on the Pi setup).
@geerlingguy, do you have a RK3588 board you can test this card with? It's not going to work just yet as I still have to push some firmware changes (likely by the end of this week), but I'm curious whether you can reproduce the issue there with a mainline kernel.
Any chance to support 1070 Ti?
@pj1976, the open driver variant only supports newer GSP-capable cards - so no, not directly. If this patchset gets merged, there's a chance the fixes could also land in the proprietary driver. But it's also possible that the legacy firmware code has similar issues, and since we have no public sources for that, it would require special attention from NVIDIA.
For older cards, you can try nouveau with these kernel patches: https://github.com/mariobalanica/arm-pcie-gpu-patches/tree/nvidia-wip/linux/6.17
YMMV though, as nouveau does no reclocking for older cards and so you're probably not going to get a lot of performance out of it.
So do we have to peg to 4k to use this? Does anyone have an estimate on the work to port the driver to 16k, or is this about the firmware (I cant imagine why CPU page size would affect driver)
@Lewiscowles1986, the driver's memory manager layer complained about the page size. I'd recommend opening another issue for that.
@mariobalanica
the driver's memory manager layer complained about the page size. I'd recommend opening another issue for that.
Note that I have tested the open and proprietary drivers with a few cards on the Thelio Astra, with 64K page size, and that worked fine, so I wonder if there are special cases for Ampere in Nvidia's drivers...
Looks like it just doesn't currently support a 16K page size: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/580.95.05/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c#L1935
Looks like it just doesn't currently support a 16K page size: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/580.95.05/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c#L1935
Do you think it could be as easy as
case RM_PAGE_SIZE_8K:
*pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _8KB, *pRetAttr);
break;
case RM_PAGE_SIZE_16K:
*pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _16KB, *pRetAttr);
break;
Surely not?
I am assuming in this that the same patch could macro in ifndef patches and within define the new defines. Not asking for it to be a part of your patch, but this kind of thing always surprises and interests me.
Looks like it just doesn't currently support a 16K page size: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/580.95.05/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c#L1935
Do you think it could be as easy as
case RM_PAGE_SIZE_8K: *pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _8KB, *pRetAttr); break; case RM_PAGE_SIZE_16K: *pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _16KB, *pRetAttr); break;Surely not?
It's not. The GPU does not support a 16K (or 8K) page size.
That said the driver works on 16K system (M2 Ultra running linux) wtih following patch:
diff --git a/src/nvidia/inc/kernel/gpu/mem_mgr/rm_page_size.h b/src/nvidia/inc/kernel/gpu/mem_mgr/rm_page_size.h
index f25795ea..9639255d 100644
--- a/src/nvidia/inc/kernel/gpu/mem_mgr/rm_page_size.h
+++ b/src/nvidia/inc/kernel/gpu/mem_mgr/rm_page_size.h
@@ -36,6 +36,7 @@
//---------------------------------------------------------------------------
#define RM_PAGE_SIZE_INVALID 0
#define RM_PAGE_SIZE 4096
+#define RM_PAGE_SIZE_16K (16 * 1024)
#define RM_PAGE_SIZE_64K (64 * 1024)
#define RM_PAGE_SIZE_128K (128 * 1024)
#define RM_PAGE_MASK 0x0FFF
diff --git a/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c b/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c
index 14d24956..41c0d5ec 100644
--- a/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c
+++ b/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c
@@ -1935,6 +1935,7 @@ memmgrDeterminePageSize_IMPL
switch (pageSize)
{
case RM_PAGE_SIZE:
+ case RM_PAGE_SIZE_16K:
*pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _4KB, *pRetAttr);
break;
There is some logsplat but the card works including display for desktop session and glmark2/vkmark.
[ 461.159024] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NULL != mmuWalkFindLevel(pWalk, pLevelFmt) @ mmu_walk_reserve.c:47
[ 461.159072] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NULL != mmuWalkFindLevel(pWalk, pLevelFmt) @ mmu_walk_reserve.c:104
[ 461.159118] kernel: NVRM: dmaAllocMapping_GM107: can't alloc VA space for mapping.
[ 469.408029] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NULL != mmuWalkFindLevel(pWalk, pLevelFmt) @ mmu_walk_reserve.c:104
[ 469.408111] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:5135
[ 469.408137] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:179
Any hope to resurrect old jetson nanos with a standard distro thanks to this? So to not be bounded to jetpack drivers for gpu acceleration?
Thanks!
@geerlingguy, do you have a RK3588 board you can test this card with? It's not going to work just yet as I still have to push some firmware changes (likely by the end of this week), but I'm curious whether you can reproduce the issue there with a mainline kernel.
The latest EDK2 firmware build (https://github.com/edk2-porting/edk2-rk3588/actions) enables full support for NVIDIA cards on RK3588, without any kernel/DT patches.
Any hope to resurrect old jetson nanos with a standard distro thanks to this? So to not be bounded to jetpack drivers for gpu acceleration?
Thanks!
@Darkhub, this PR does not enable GPUs that aren't already supported by the open driver variant. Jetson Nano is older Maxwell architecture. See: https://github.com/NVIDIA/open-gpu-kernel-modules/issues/19
should this work with an RTX 5000 Quadro?
Do all Nvidia cards work, including data center cards?
@itsanirudhsrinivasan not all cards will work, this is for a specific generation or fe generations. That information is contained above in this thread. https://github.com/NVIDIA/open-gpu-kernel-modules/pull/972#issuecomment-3596229205
Also, it seems like https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus lists compatible GPUs
Oh, thanks @Lewiscowles1986
Hi everyone,
I have an RPi 5 running a simple media server with docker-compose.
I’m considering connecting an NVIDIA RTX 3080 to it, powered by a 620W PSU, to handle 4K transcoding.
Is this setup even possible at the moment?
Thanks!
@guytamari see Nvidia Graphics Cards work on Pi 5 and Rockchip
How does this patch work. I by no means know about gpu acceleration in the kernel level, but I’m still curious