open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

Fixes for non-standard Arm SoC PCIe integrations

Open mariobalanica opened this issue 1 week ago • 17 comments

This patchset attempts to address a number of limitations present in commonly available Arm SoCs:

  • lack of I/O cache-coherency (bus snooping)
  • no support for write-combined MMIO mappings (used for the VRAM BAR)

Tested on RK3588 (has all issues above) and CIX P1 (no issues, SBSA-compliant) with an RTX 3050 8 GB.

Most things I've tried (Steam games, benchmarks, monitoring tools, CUDA) work fine now.

See https://github.com/mariobalanica/arm-pcie-gpu-patches/issues/2 for related discussion and demos of the driver running.

Side note: there's currently no Arm userspace release for driver version 580.105.08, so you'll need to stick with 580.95.05.

mariobalanica avatar Nov 23 '25 21:11 mariobalanica

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 23 '25 21:11 CLAassistant

@mariobalanica - Thanks for posting the PR. I'm trying to test it on a CM5 with an Nvidia A4000, which is detected via lspci, but I am trying to build from your branch... and not getting the module to load in. Maybe I'm doing something wrong here?

  1. Flash Pi OS 13 'Trixie' to a new boot drive using Raspberry Pi Imager
  2. Boot the Pi and run sudo apt update && sudo apt upgrade -y to make sure you're on the latest versions
  3. Reboot the Pi: sudo reboot

Driver install:

cd ~/Downloads && git clone --branch non-coherent-arm-fixes https://github.com/mariobalanica/open-gpu-kernel-modules.git
cd open-gpu-kernel-modules
make modules -j$(nproc)
make modules_install -j$(nproc)

Download 580.95.05 aarch64 driver from: https://www.nvidia.com/en-us/drivers/unix/

Install without kernel modules (to not write over the ones we just built):

sudo sh ./NVIDIA-Linux-aarch64-580.95.05.run --no-kernel-modules

After that completes, and I reboot, the module does not load in. And attempting to load it manually:

jgeerling@cm5:~/Downloads/open-gpu-kernel-modules $ sudo modprobe nvidia
modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.12.47+rpt-rpi-2712

jgeerling@cm5:~/Downloads/open-gpu-kernel-modules $ ls /lib/modules/6.12.47+rpt-rpi-2712/kernel/drivers/video/
backlight  fbdev  nvidia-drm.ko.xz  nvidia.ko.xz  nvidia-modeset.ko.xz  nvidia-peermem.ko.xz  nvidia-uvm.ko.xz

Am I doing something wrong here or missing a step?

EDIT: I was not updating the module database: sudo depmod -a

Now I'm getting the module to load, but I still wind up with the classic 'RmInitAdapter failed!' error:

[    9.268222] NVRM: Chipset not recognized (vendor ID 0x14e4, device ID 0x2712)
[    9.268230] The NVIDIA GPU driver for AArch64 has not been qualified on this platform
               and therefore it is not recommended or intended for use in any production
               environment.
[   10.809142] NVRM: memmgrAllocResources_IMPL: memmgrDeterminePageSize failed, status: 0x40
[   10.809151] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from memmgrAllocResources(pGpu, pMemoryManager, pAllocRequest, pFbAllocInfo) @ system_mem.c:698
[   10.809156] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from pRmApi->AllocWithHandle(pRmApi, pChannel->hClient, hDevice, hPhysMem, hClass, &memAllocParams, sizeof(memAllocParams)) @ mem_utils_gm107.c:302
[   10.810856] NVRM: nvAssertOkFailedNoLog: Assertion failed: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057) returned from kchannelGetNotifierInfo(pGpu, pDevice, pKernelChannel->hErrorContext, &pKernelChannel->pErrContextMemDesc, &pKernelChannel->errorContextType, &pKernelChannel->errorContextOffset) @ kernel_channel.c:526
...
[   16.833048] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from scrubberConstruct(pGpu, pHeap) @ mem_mgr_scrub_gp100.c:60
[   16.833052] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from memmgrScrubHandlePostSchedulingEnable_HAL(pGpu, pMemoryManager) @ mem_mgr.c:487
[   16.833056] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_fifo.c:3129
[   16.833063] NVRM: RmInitNvDevice: *** Cannot load state into the device
[   16.833065] NVRM: RmInitAdapter: RmInitNvDevice failed, bailing out of RmInitAdapter
[   16.835308] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:346
[   17.057567] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:346
[   17.057628] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x10100
[   17.057641] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x25:0x40:1236)
[   17.058087] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0

geerlingguy avatar Nov 26 '25 16:11 geerlingguy

[ 10.809142] NVRM: memmgrAllocResources_IMPL: memmgrDeterminePageSize failed, status: 0x40

Is the kernel built with a non-4K page size?

mariobalanica avatar Nov 26 '25 19:11 mariobalanica

@mariobalanica - Ah yes... this is the Pi default kernel, 16K page size. I can try switching to a 4K kernel.

jgeerling@cm5:~ $ getconf PAGESIZE
16384

jgeerling@cm5:~ $ sudo nano /boot/firmware/config.txt
# Add to bottom
kernel=kernel8.img

jgeerling@cm5:~ $ sudo reboot
...
jgeerling@cm5:~ $ getconf PAGESIZE
4096

geerlingguy avatar Nov 26 '25 21:11 geerlingguy

@mariobalanica - Okay, building off the 4K kernel gets me a loaded driver:

[    3.720807] nvidia: loading out-of-tree module taints kernel.
[    3.731762] nvidia-nvlink: Nvlink Core is being initialized, major device number 507
[    3.739304] nvidia 0001:01:00.0: enabling device (0000 -> 0002)
[    3.739445] nvidia 0001:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    3.838549] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for aarch64  580.95.05  Release Build  (jgeerling@cm5)  Wed 26 Nov 16:10:28 CST 2025
[    3.845587] [drm] [nvidia-drm] [GPU ID 0x00010100] Loading driver
[    3.845696] [drm] Initialized nvidia-drm 0.0.0 for 0001:01:00.0 on minor 2
               environment.

However, displays plugged into the DisplayPort plugs don't seem to get any signal.

But nvidia-smi does work:

jgeerling@cm5:~ $ nvidia-smi
Wed Nov 26 16:16:19 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A4000               Off |   00000001:01:00.0 Off |                  Off |
| 41%   41C    P8              7W /  140W |       1MiB /  16376MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Vulkan info:

$ DISPLAY=:0 vulkaninfo --summary
Vulkan Instance Version: 1.4.309

Devices:
========
GPU0:
	apiVersion         = 1.4.312
	driverVersion      = 580.95.5.0
	vendorID           = 0x10de
	deviceID           = 0x24b0
	deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
	deviceName         = NVIDIA RTX A4000
	driverID           = DRIVER_ID_NVIDIA_PROPRIETARY
	driverName         = NVIDIA
	driverInfo         = 580.95.05
	conformanceVersion = 1.4.1.3
	deviceUUID         = 8571e0b8-bbc4-4e4d-ccb9-a8100b67850a
	driverUUID         = b92269a1-b525-5615-ab8a-e2095ee37192

I tried compiling llama.cpp with CUDA, but got:

-- Could not find nvcc, please set CUDAToolkit_ROOT.
CMake Error at ggml/src/ggml-cuda/CMakeLists.txt:190 (message):
  CUDA Toolkit not found

So I switched to Vulkan to see if acceleration is working:

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA RTX A4000 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = V3D 7.1.10.2 (V3DV Mesa) | uma: 1 | fp16: 0 | bf16: 0 | warp size: 16 | shared memory: 16384 | int dot: 0 | matrix cores: none

And it's definitely accelerated... NICE! More info here https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/692#issuecomment-3583473917

Do you know if there are any other tricks to getting display output? On other cards where Mesa wasn't happy but the drivers worked, I would at least get output with a flashing cursor, and I could press Alt + F2 to get to console. Here I'm not even seeing that...

geerlingguy avatar Nov 26 '25 22:11 geerlingguy

Not sure, I didn't have to do anything special to get HDMI out. What does drm_info say?

mariobalanica avatar Nov 27 '25 00:11 mariobalanica

@mariobalanica -

jgeerling@cm5:~ $ ls -lah /dev/dri
total 0
drwxr-xr-x   3 root root        160 Nov 27 09:57 .
drwxr-xr-x  17 root root       4.5K Nov 27 09:57 ..
drwxr-xr-x   2 root root        140 Nov 27 09:57 by-path
crw-rw----+  1 root video  226,   0 Nov 27 09:57 card0
crw-rw----+  1 root video  226,   1 Nov 27 09:57 card1
crw-rw----+  1 root video  226,   2 Nov 27 09:57 card2
crw-rw----+  1 root render 226, 128 Nov 27 09:57 renderD128
crw-rw----+  1 root render 226, 129 Nov 27 09:57 renderD129
jgeerling@cm5:~ $ drm_info
drmModeGetResources: Operation not supported
Failed to retrieve information from /dev/dri/card2
drmModeGetResources: Operation not supported
Failed to retrieve information from /dev/dri/card0
Node: /dev/dri/card1
├───Driver: vc4 (Broadcom VC4 graphics) version 0.0.0 (0)
...

From dmesg during boot:

[    3.908829] [drm] [nvidia-drm] [GPU ID 0x00010100] Loading driver
[    3.908970] [drm] Initialized nvidia-drm 0.0.0 for 0001:01:00.0 on minor 2
jgeerling@cm5:~ $ lsmod | grep nvidia
nvidia_uvm           1622016  0
nvidia_drm            122880  2
nvidia_modeset       1912832  1 nvidia_drm
nvidia              14671872  7 nvidia_uvm,nvidia_modeset
drm_ttm_helper         16384  1 nvidia_drm
drm_kms_helper        229376  5 drm_dma_helper,vc4,drm_shmem_helper,drm_ttm_helper,nvidia_drm
drm                   675840  21 gpu_sched,drm_kms_helper,drm_dma_helper,v3d,vc4,drm_shmem_helper,drm_display_helper,nvidia,drm_ttm_helper,nvidia_drm,ttm
backlight              24576  3 drm_kms_helper,drm,nvidia_modeset

geerlingguy avatar Nov 27 '25 16:11 geerlingguy

Any chance to support 1070 Ti?

pj1976 avatar Nov 28 '25 00:11 pj1976

I'm fairly certain the 1070 doesn't have the GSP firmware required to work with the open-gpu-kernel-modules — see, similarly, the 750 Ti: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/26#issuecomment-3586566167

geerlingguy avatar Nov 28 '25 03:11 geerlingguy

Thanks!

I'm fairly certain the 1070 doesn't have the GSP firmware required to work with the open-gpu-kernel-modules — see, similarly, the 750 Ti: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/26#issuecomment-3586566167

pj1976 avatar Nov 28 '25 12:11 pj1976

So do we have to peg to 4k to use this? Does anyone have an estimate on the work to port the driver to 16k, or is this about the firmware (I cant imagine why CPU page size would affect driver)

Lewiscowles1986 avatar Nov 28 '25 13:11 Lewiscowles1986

I've now tested an RTX A4000, 750 Ti, and 3080 Ti on a Raspberry Pi CM5 running Debian Trixie. The 750 Ti (referenced above) doesn't have GSP firmware so can't work with the open driver.

For the other two, the behavior was the same:

  • No errors in dmesg, no indication of anything not working in nvidia-smi
  • Works great for llama.cpp and GPU compute purposes
  • No output through DisplayPort on any plug (or on 3080 Ti's HDMI port), along with no indication nvidia-drm is working/not working that I can see...

I tested everything on both cards on a separate Intel x86 system on my bench, and both HDMI and DisplayPort outputs worked fine on that setup (running Ubuntu 25.10, using the same version of the driver that I've installed on the Pi setup).

geerlingguy avatar Nov 28 '25 21:11 geerlingguy

@geerlingguy, do you have a RK3588 board you can test this card with? It's not going to work just yet as I still have to push some firmware changes (likely by the end of this week), but I'm curious whether you can reproduce the issue there with a mainline kernel.

Any chance to support 1070 Ti?

@pj1976, the open driver variant only supports newer GSP-capable cards - so no, not directly. If this patchset gets merged, there's a chance the fixes could also land in the proprietary driver. But it's also possible that the legacy firmware code has similar issues, and since we have no public sources for that, it would require special attention from NVIDIA.

For older cards, you can try nouveau with these kernel patches: https://github.com/mariobalanica/arm-pcie-gpu-patches/tree/nvidia-wip/linux/6.17

YMMV though, as nouveau does no reclocking for older cards and so you're probably not going to get a lot of performance out of it.

So do we have to peg to 4k to use this? Does anyone have an estimate on the work to port the driver to 16k, or is this about the firmware (I cant imagine why CPU page size would affect driver)

@Lewiscowles1986, the driver's memory manager layer complained about the page size. I'd recommend opening another issue for that.

mariobalanica avatar Nov 28 '25 22:11 mariobalanica

@mariobalanica

the driver's memory manager layer complained about the page size. I'd recommend opening another issue for that.

Note that I have tested the open and proprietary drivers with a few cards on the Thelio Astra, with 64K page size, and that worked fine, so I wonder if there are special cases for Ampere in Nvidia's drivers...

geerlingguy avatar Nov 28 '25 22:11 geerlingguy

Looks like it just doesn't currently support a 16K page size: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/580.95.05/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c#L1935

mariobalanica avatar Nov 28 '25 23:11 mariobalanica

Looks like it just doesn't currently support a 16K page size: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/580.95.05/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c#L1935

Do you think it could be as easy as

    case RM_PAGE_SIZE_8K:
        *pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _8KB, *pRetAttr);
        break;

    case RM_PAGE_SIZE_16K:
        *pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _16KB, *pRetAttr);
        break;

Surely not?

I am assuming in this that the same patch could macro in ifndef patches and within define the new defines. Not asking for it to be a part of your patch, but this kind of thing always surprises and interests me.

Lewiscowles1986 avatar Nov 29 '25 07:11 Lewiscowles1986

Looks like it just doesn't currently support a 16K page size: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/580.95.05/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c#L1935

Do you think it could be as easy as

    case RM_PAGE_SIZE_8K:
        *pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _8KB, *pRetAttr);
        break;

    case RM_PAGE_SIZE_16K:
        *pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _16KB, *pRetAttr);
        break;

Surely not?

It's not. The GPU does not support a 16K (or 8K) page size.

That said the driver works on 16K system (M2 Ultra running linux) wtih following patch:

diff --git a/src/nvidia/inc/kernel/gpu/mem_mgr/rm_page_size.h b/src/nvidia/inc/kernel/gpu/mem_mgr/rm_page_size.h
index f25795ea..9639255d 100644
--- a/src/nvidia/inc/kernel/gpu/mem_mgr/rm_page_size.h
+++ b/src/nvidia/inc/kernel/gpu/mem_mgr/rm_page_size.h
@@ -36,6 +36,7 @@
 //---------------------------------------------------------------------------
 #define RM_PAGE_SIZE_INVALID 0
 #define RM_PAGE_SIZE         4096
+#define RM_PAGE_SIZE_16K     (16 * 1024)
 #define RM_PAGE_SIZE_64K     (64 * 1024)
 #define RM_PAGE_SIZE_128K    (128 * 1024)
 #define RM_PAGE_MASK         0x0FFF
diff --git a/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c b/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c
index 14d24956..41c0d5ec 100644
--- a/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c
+++ b/src/nvidia/src/kernel/gpu/mem_mgr/mem_mgr.c
@@ -1935,6 +1935,7 @@ memmgrDeterminePageSize_IMPL
     switch (pageSize)
     {
         case RM_PAGE_SIZE:
+        case RM_PAGE_SIZE_16K:
             *pRetAttr = FLD_SET_DRF(OS32, _ATTR, _PAGE_SIZE, _4KB, *pRetAttr);
             break;
 

There is some logsplat but the card works including display for desktop session and glmark2/vkmark.

[  461.159024] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NULL != mmuWalkFindLevel(pWalk, pLevelFmt) @ mmu_walk_reserve.c:47
[  461.159072] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NULL != mmuWalkFindLevel(pWalk, pLevelFmt) @ mmu_walk_reserve.c:104
[  461.159118] kernel: NVRM: dmaAllocMapping_GM107: can't alloc VA space for mapping.
[  469.408029] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NULL != mmuWalkFindLevel(pWalk, pLevelFmt) @ mmu_walk_reserve.c:104
[  469.408111] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:5135
[  469.408137] kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:179

jannau avatar Nov 29 '25 09:11 jannau

Any hope to resurrect old jetson nanos with a standard distro thanks to this? So to not be bounded to jetpack drivers for gpu acceleration?

Thanks!

Darkhub avatar Dec 01 '25 03:12 Darkhub

@geerlingguy, do you have a RK3588 board you can test this card with? It's not going to work just yet as I still have to push some firmware changes (likely by the end of this week), but I'm curious whether you can reproduce the issue there with a mainline kernel.

The latest EDK2 firmware build (https://github.com/edk2-porting/edk2-rk3588/actions) enables full support for NVIDIA cards on RK3588, without any kernel/DT patches.

Any hope to resurrect old jetson nanos with a standard distro thanks to this? So to not be bounded to jetpack drivers for gpu acceleration?

Thanks!

@Darkhub, this PR does not enable GPUs that aren't already supported by the open driver variant. Jetson Nano is older Maxwell architecture. See: https://github.com/NVIDIA/open-gpu-kernel-modules/issues/19

mariobalanica avatar Dec 01 '25 12:12 mariobalanica

should this work with an RTX 5000 Quadro?

Lewiscowles1986 avatar Dec 01 '25 21:12 Lewiscowles1986

Do all Nvidia cards work, including data center cards?

itsanirudhsrinivasan avatar Dec 03 '25 07:12 itsanirudhsrinivasan

@itsanirudhsrinivasan not all cards will work, this is for a specific generation or fe generations. That information is contained above in this thread. https://github.com/NVIDIA/open-gpu-kernel-modules/pull/972#issuecomment-3596229205

Also, it seems like https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus lists compatible GPUs

Lewiscowles1986 avatar Dec 03 '25 07:12 Lewiscowles1986

Oh, thanks @Lewiscowles1986

itsanirudhsrinivasan avatar Dec 03 '25 09:12 itsanirudhsrinivasan

Hi everyone,

I have an RPi 5 running a simple media server with docker-compose.

I’m considering connecting an NVIDIA RTX 3080 to it, powered by a 620W PSU, to handle 4K transcoding.

Is this setup even possible at the moment?

Thanks!

guytamari avatar Dec 03 '25 19:12 guytamari

How does this patch work. I by no means know about gpu acceleration in the kernel level, but I’m still curious

itsanirudhsrinivasan avatar Dec 04 '25 14:12 itsanirudhsrinivasan