Testing yanghaku's DRM patch for cache coherency on various graphics cards
I wanted to consolidate some of my testing into a single issue, as I'd like to see how well the simple fix from https://github.com/geerlingguy/raspberry-pi-pcie-devices/discussions/756 works for various graphics cards.
Test Setup
I have a Raspberry Pi 500+, and test graphics cards inside this JMT M.2 M-Key to OCuLink eGPU dock (model DOCK-OC4), using a chenyang SFF-8611 OCuLink cable. Photo of that part of the setup below:
For cards with HDMI and DisplayPort outputs, I plug HDMI directly into my 4K display. For cards with only DisplayPort, I plug DisplayPort directly into my 4K display.
I'll leave further details in the comments, but from a high level, here are cards that have been tested under Raspberry Pi OS rpi-6.15.y with that patch applied (click on the card name to go to the specific GitHub issue with more details on that card):
AMD (amdgpu)
| Card | glmark2 | vkmark | GravityMark | Notes |
|---|---|---|---|---|
| AMD Radeon RX 9070 XT 16GB | 8151 | 14946 | 56944 | Required newer version of Mesa |
| AMD Radeon RX 7900 XT 20GB | 8011 | 14319 | 65809 | Everything runs smoothly |
| AMD Radeon Pro W7700 16GB | 7589 | 12929 | 40782 | Everything runs smoothly |
| AMD Radeon RX 7600 8GB | 8074 | 14790 | 27861 | Everything runs smoothly |
| AMD Radeon RX 6700 XT 12GB | 7841 | 14855 | 36455 | GUI choppy until restarting lightdm |
| AMD Radeon RX 6500 XT 8GB | 7668 | 14185 | 14865 | GUI choppy until restarting lightdm |
| AMD Radeon RX 580 8GB | 7613 | 12753 | 12103 | Everything runs smoothly |
| AMD Radeon RX 460 4GB | 7269 | 10089 | 5739 | Everything runs smoothly |
AMD (radeon)
| Card | glmark2 | vkmark | GravityMark | Notes |
|---|---|---|---|---|
| AMD R5 230 2GB | 356 | 563 | DNF | Only X11 works, see linked issue |
| AMD Radeon HD 7470 1GB | 539 | 230 | DNF | Only X11 works, see linked issue |
Intel Arc
| Card | glmark2 | vkmark | GravityMark | Notes |
|---|---|---|---|---|
| Intel B580 Sparkle 12GB | 5622 | 10133 | 55209 | Artifacting, but better than before; see #695 |
| Intel Arc A750 | 4223 | 12313 | 25462 | Artifacting, but better than before; see #510 |
| Intel Arc Pro B50 | 5945 | 14316 | 18140 | Artifacting, but better than before; see #779 |
| Intel Arc A310 ECO | 4984 | 10885 | 7447 | Artifacting, same as above; see #778 |
Nvidia RTX / GTX
| Card | glmark2 | vkmark | GravityMark | Notes |
|---|---|---|---|---|
| Nvidia RTX A4000 | - | - | - | Get error dump on swiotlb_map |
| Nvidia GeForce GTX 750 Ti | - | - | - | Get error dump on swiotlb_map |
Commands used (representative example):
# glmark2
$ DISPLAY=:0 glmark2-es2-wayland
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX 6500 XT (radeonsi, navi24, LLVM 15.0.6, DRM 3.63, 6.15.11-v8-16k+)
GL_VERSION: OpenGL ES 3.2 Mesa 24.2.8-1~bpo12+rpt4
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 800x600 windowed
=======================================================
# vkmark
$ DISPLAY=:0 build/src/vkmark --winsys-dir=build/src --data-dir=data
=======================================================
vkmark 2025.01
=======================================================
Vendor ID: 0x1002
Device ID: 0x743F
Device Name: AMD Radeon RX 6500 XT (RADV NAVI24)
Driver Version: 100671496
Device UUID: 3ab6f656c2fadb58e7866ca2641bd4d6
=======================================================
# GravityMark
$ sudo ./GravityMark_1.88_arm64.run
# (Use defaults: 200k asteroids, 1600x900, windowed, Vulkan)
I don't mind testing my 3 cards later on with the new patches this weekend I have the Radeon HD 7870, GTX 1050 Ti, then a RTX 4060
I haven't tried any Nvidia cards yet, I'm still assuming the deeper patchset is needed before those will work.
Interestingly, I was testing on a fresh Pi OS install today, and found the 7900 XT was only getting 64817 on PCIe Gen 2 (the default). After bumping to Gen 3, I got 65809.
Though, running the test four more times, the scores ranged from about 64,700 to 65,800.
Seeing as the change is contained within CONFIG_DRM_TTM which isn't used by vc4 or v3d, the side effects of changing it on a Pi are very minimal.
For some reason the RP1 DSI, DPI, and VEC drivers are requesting it, but that may just be due to a bit too much copy/paste.
(I might dig out my NVidia GT710 quad HDMI output card tomorrow. Let's see how nouveau does with these changes)
@6by9 That'd be great—and I will also try to test with nouveau at some point, just wanting to finish hammering out all the AMD cards I can get my hands on for testing right now.
I was also noticing on the RX 460 that @Coreforge and I both tested, his scores were mostly slightly higher than mine on GravityMark:
Could be there are some optimized paths in the amdgpu-driver-specific changes that aren't accounted for with the DRM-wide changes; but the difference is relatively small.
Are you guys using the same card or is Coreforge using a slightly overclocked card? I know my specific card is factory overclocked 50 MHz over what the Radeon HD 7870 GHz edition is rated for (what's the base clock? Idk it's definitely not in the name of the card). Just a thought.
Also 6by9 I fully intend to test nouveau's codepaths later on both of my NVIDIA cards, since one of the main reasons I even came here (other than my long-standing fascination in Jeff's quest to get GPUs working on the Pi) is because I want to help get all 3 vendor's cards working. I won't rest until that's done... even if most of my time is currently eaten by my semester.
@6by9 - Testing with two different generations of Nvidia GPUs, I get a similar kernel dump on each with the nouveau driver:
[ 46.089143] nouveau 0001:01:00.0: enabling device (0000 -> 0002)
[ 46.089176] nouveau 0001:01:00.0: NVIDIA GA104 (b74000a1)
[ 46.234439] nouveau 0001:01:00.0: bios: version 94.04.57.00.08
[ 46.235150] ------------[ cut here ]------------
[ 46.235159] nouveau 0001:01:00.0: swiotlb addr 0x00000010fce00000+4096 overflow (mask ffffffff, bus limit ffffffffff).
[ 46.235177] WARNING: CPU: 3 PID: 949 at kernel/dma/swiotlb.c:1594 swiotlb_map+0x2c0/0x2f0
[ 46.235193] Modules linked in: nouveau(+) drm_gpuvm i2c_algo_bit drm_exec gpu_sched drm_display_helper drm_ttm_helper ttm drm_client_lib drm_kms_helper snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd rfcomm algif_hash algif_skcipher af_alg bnep binfmt_misc brcmfmac_wcc spidev aes_ce_blk aes_ce_cipher ghash_ce rpi_hevc_dec hci_uart pisp_be btbcm v4l2_mem2mem videobuf2_dma_contig gf128mul bluetooth sha2_ce sha256_arm64 sha1_ce raspberrypi_hwmon sha1_generic brcmfmac brcmutil ecdh_generic ecc cfg80211 videobuf2_memops videobuf2_v4l2 spi_bcm2835 videodev gpio_keys rfkill videobuf2_common mc raspberrypi_gpiomem rp1_pio rp1_adc rp1_mailbox rp1_fw sg nvmem_rmem joydev hid_multitouch uio_pdrv_genirq uio drm i2c_dev ledtrig_pattern fuse drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6
[ 46.235294] CPU: 3 UID: 0 PID: 949 Comm: modprobe Not tainted 6.15.11-v8-4k-gpu+ #1 PREEMPT
[ 46.235299] Hardware name: Raspberry Pi 500 Rev 1.0 (DT)
[ 46.235302] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 46.235305] pc : swiotlb_map+0x2c0/0x2f0
[ 46.235308] lr : swiotlb_map+0x2c0/0x2f0
[ 46.235310] sp : ffffffc0837b3630
[ 46.235311] x29: ffffffc0837b3640 x28: ffffff8301bb4c00 x27: ffffffd08d667000
[ 46.235316] x26: 000000108121b000 x25: fffffffec20486c0 x24: 0000000000001000
[ 46.235321] x23: ffffff8003a5c0c8 x22: 0000000000000000 x21: 0000000000000000
[ 46.235325] x20: 000000008121b000 x19: 000000108121b000 x18: 00000000ffffffff
[ 46.235330] x17: 667265766f203639 x16: 30342b3030303030 x15: 6563663031303030
[ 46.235334] x14: 3030307830207264 x13: 2e29666666666666 x12: 666666662074696d
[ 46.235338] x11: 696c20737562202c x10: ffffffd08dd1eca8 x9 : ffffffd08c639948
[ 46.235342] x8 : 00000000ffffefff x7 : ffffffd08dd1eca8 x6 : 80000000fffff000
[ 46.235346] x5 : ffffff807ffdd5c8 x4 : 0000000000000000 x3 : 0000000000000000
[ 46.235350] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff8081220000
[ 46.235354] Call trace:
[ 46.235356] swiotlb_map+0x2c0/0x2f0 (P)
[ 46.235360] dma_map_page_attrs+0x1d8/0x368
[ 46.235365] nvkm_fb_ctor+0xc4/0x100 [nouveau]
[ 46.235554] gf100_fb_new_+0x68/0x98 [nouveau]
[ 46.235697] ga102_fb_new+0x6c/0x88 [nouveau]
[ 46.235832] nvkm_device_ctor+0xaec/0x36d8 [nouveau]
[ 46.235968] nvkm_device_pci_new+0xe0/0x2b8 [nouveau]
[ 46.236105] nouveau_drm_probe+0x4c/0x1e0 [nouveau]
[ 46.236244] local_pci_probe+0x48/0xb8
[ 46.236253] pci_device_probe+0xcc/0x1d8
[ 46.236256] really_probe+0xc4/0x2a8
[ 46.236262] __driver_probe_device+0x80/0x140
[ 46.236266] driver_probe_device+0x44/0x170
[ 46.236270] __driver_attach+0x9c/0x1b0
[ 46.236274] bus_for_each_dev+0x84/0xf0
[ 46.236277] driver_attach+0x2c/0x40
[ 46.236281] bus_add_driver+0xec/0x218
[ 46.236285] driver_register+0x68/0x138
[ 46.236289] __pci_register_driver+0x54/0x68
[ 46.236292] nouveau_drm_init+0x24c/0xff8 [nouveau]
[ 46.236427] do_one_initcall+0x64/0x288
[ 46.236431] do_init_module+0x60/0x230
[ 46.236435] load_module+0x1c70/0x1f08
[ 46.236438] __do_sys_init_module+0x188/0x1f8
[ 46.236441] __arm64_sys_init_module+0x24/0x38
[ 46.236444] invoke_syscall+0x50/0x120
[ 46.236448] el0_svc_common.constprop.0+0x48/0xf0
[ 46.236452] do_el0_svc+0x24/0x38
[ 46.236455] el0_svc+0x30/0xd0
[ 46.236459] el0t_64_sync_handler+0x10c/0x138
[ 46.236462] el0t_64_sync+0x198/0x1a0
[ 46.236465] ---[ end trace 0000000000000000 ]---
I've added links to the two cards I've tested. I will also try the open source driver if I get some time.
It's using swiotlb oddly- this will merit a lot of research later. I was under the impression the Pi 5 didn't need swiotlb?
What it looks like is happening is for some reason nouveau is defaulting to swiotlb for mapping pages for devices, and its trying to map an address that is outside of the mmio range?
One quick thought @geerlingguy: I've been testing all of my kernels with a 48 bit address space because I can't run Ryujinx with the smaller 39 bit address space. Maybe the driver's trying to map a higher virtual address than the kernel currently supports?
We don't have the exact same cards. I didn't overclock mine, but it's possible it has higher clocks set by default. Mine is this one, and according to techpowerup, all the listed XFX models have a slightly higher boost clock set by default (but those 8MHz shouldn't make that much of a difference). You can also actually see the GPU frequencies in the GravityMark reports.
Since the runs were done on different driver- and GravityMark Versions, there's a lot of variables that might affect it. I did usually run my pi overclocked to 3GHz though (and at PCIe 3.0), which I think would probably be the main reason.
@RSC-Games - Going to quickly recompile here: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/692#issuecomment-3353768036
(Edit: 48-bit addressing didn't change anything, nor did increasing swiotlb=65535 (for 128 MB) in cmdline.txt.)
Okay after a quick google it looks like swiotlb is for bounce buffering for older devices (like cards limited to 32 bit DMA). It probably would've helped a lot with the Pi 4 PCIe issues back in those days...
For some reason Nouveau's trying to map a 64 bit address in the swiotlb range, which is limited to 32 bit addressing (as is shown here:
[ 46.235159] nouveau 0001:01:00.0: swiotlb addr 0x00000010fce00000+4096 overflow (mask ffffffff, bus limit ffffffffff)
I'm taking stabs in the dark here as I haven't done in-depth analysis on any of these parts of the kernel, but maybe there's some compile time switch somewhere that's forcing the framebuffer memory allocator to fall back to swiotlb even though the Pi 5 is definitely capable of 64 bit DMA. This is odd and honestly smells more like a odd driver quirk than a Pi issue.
Random thought- @geerlingguy do you know if Nouveau works on an ARM system that has a coherent PCIe bus, like the Raxda or whatever it is? I'm sure you have an ARM board on hand somewhere that is fully SBSA compliant. I'm curious to see what happens there... It should work fine but if it doesn't then we have our work cut out for us.
I'd initially made this comment on https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/2 as it was the earliest Github notification I'd had referencing nouveau WARNs. Duplicating here as it seems to be a better place.
I'm slightly intrigued how you added a PCIe device to a Pi500, but I know the hacking you (and red shirt Jeff) do :-)
swiotlb would be using bounce buffers. There should be no need for those, which makes me wonder if you still have dtoverlay=pcie-32bit-dma still loaded from testing other cards. Limiting the PCI window size to 4GB may well cause this sort of error.
Then again there is funky code in nouveau trying to set a DMA mask at https://github.com/raspberrypi/linux/blob/rpi-6.12.y/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c#L1687-L1697, except your backtrace says we're in nvkm_device_ctor which is called from line 1674, so just before this.
All most curious, and may imply that dtoverlay=pcie-32bit-dma would help if it's not already loaded (bus limit ffffffffff is 40 bits, which implies that it hasn't got a restriction, but the device does due to mask ffffffff).
It's probably also worth saying that DRM doesn't play nicely with bounce buffers. DRM expects that once a buffer is on display, the backing buffer can be amended by userspace and that updates what is on the screen, although updates may tear. If there are bounce buffers in the way, that falls apart.
BTW Life may be easier if you start with a Trixie image rather than Bookworm. Latest nightly build is at https://downloads.raspberrypi.com/nightlies/
It might be time to go wild with the printks in dma_map_page_attrs - if you determine the caller then it could be a quite reliable source of debug information... Could trigger an oops or a kernel stack dump in the nouveau code right before the dma mapping is made so we can tell which one is related...
https://github.com/raspberrypi/linux/blob/2f4a28199c418599ba0224186e42926b482b523c/kernel/dma/mapping.c#L155
(nvkm_fb_ctor https://github.com/raspberrypi/linux/blob/2f4a28199c418599ba0224186e42926b482b523c/drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c#L272)
Looking at the callstack:
dma_map_page_attrs nvkm_fb_ctor - https://elixir.bootlin.com/linux/v6.17/source/drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c#L286 gf100_fb_new_ - https://elixir.bootlin.com/linux/v6.17/source/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c#L114
So it's likely that it's the fact that they're assuming they can dma_map_page on an arbitrary page allocated by alloc_page(GFP_KERNEL | __GFP_ZERO);
I haven't done any analysis of what flush_page is used for.
Quickest hack to test would be changing https://elixir.bootlin.com/linux/v6.17/source/drivers/gpu/drm/nouveau/nvkm/subdev/fb/gf100.c#L127 to .sysmem.flush_page_init = NULL so that whole block in nvkm_fb_ctor is skipped due to https://elixir.bootlin.com/linux/v6.17/source/drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c#L281
~~This'll be interesting to watch. If we're skipping the framebuffer creation, who knows what's going to happen with the full blown DRM init down the line.... might give a lot more insight though.~~ I guess my main question is why can't we dma_map_page on an arbitrary page on the Pi 5? We can figure that out later
Edit: I misread your comment so ignore all that lol
We're not skipping FB creation. We're disabling what appears to be a cache flushing mechanism. I'm not clear as to which side that cache is sitting (CPU or GPU). In using uncached memory on the CPU side, it may not be required anyway.
dma_map_page can require the allocation to be within certain ranges. kzalloc will allocate at random from anywhere within RAM. Those expecting to be able to map memory into a DMA device should really be using the dma_alloc calls, not kzalloc.
That gets a little further (I needed to modify drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c for my card), but then hits timeouts.
[ 36.216242] nouveau 0001:01:00.0: enabling device (0000 -> 0002)
[ 36.216280] nouveau 0001:01:00.0: NVIDIA GK208B (b060b0b1)
[ 36.367397] nouveau 0001:01:00.0: bios: version 80.28.b8.00.05
[ 36.674395] nouveau 0001:01:00.0: fb: 2048 MiB GDDR5
[ 37.993376] nouveau 0001:01:00.0: drm: VRAM: 2048 MiB
[ 37.993387] nouveau 0001:01:00.0: drm: GART: 1048576 MiB
[ 37.993391] nouveau 0001:01:00.0: drm: TMDS table version 2.0
[ 37.999981] nouveau 0001:01:00.0: drm: MM: using COPY for buffer copies
[ 40.000507] nouveau 0001:01:00.0: drm: core caps notifier timeout
[ 40.002003] [drm] Initialized nouveau 1.4.0 for 0001:01:00.0 on minor 2
[ 40.082695] sysfs: cannot create duplicate filename '/class/graphics/fb0'
[ 40.082716] CPU: 2 UID: 0 PID: 1654 Comm: modprobe Not tainted 6.17.0-v8+ #1 PREEMPT
[ 40.082721] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[ 40.082723] Call trace:
[ 40.082725] show_stack+0x20/0x38 (C)
[ 40.082736] dump_stack_lvl+0x78/0x90
[ 40.082740] dump_stack+0x18/0x28
[ 40.082742] sysfs_warn_dup+0x6c/0x90
[ 40.082745] sysfs_do_create_link_sd+0xf8/0x108
[ 40.082749] sysfs_create_link+0x28/0x50
[ 40.082750] device_add+0x248/0x720
[ 40.082753] device_create_groups_vargs+0xe8/0x148
[ 40.082756] device_create+0x64/0x98
[ 40.082758] fb_device_create+0x58/0xf8
[ 40.082763] do_register_framebuffer+0x154/0x288
[ 40.082766] register_framebuffer+0x34/0x60
[ 40.082768] __drm_fb_helper_initial_config_and_unlock+0x30c/0x5a0 [drm_kms_helper]
[ 40.082806] drm_fb_helper_initial_config+0x4c/0x68 [drm_kms_helper]
[ 40.082819] drm_fbdev_client_hotplug+0x8c/0xe8 [drm_client_lib]
[ 40.082824] drm_client_register+0x60/0xb0 [drm]
[ 40.082920] drm_fbdev_client_setup+0xf0/0xc60 [drm_client_lib]
[ 40.082922] drm_client_setup+0xbc/0xe8 [drm_client_lib]
[ 40.082924] nouveau_drm_probe+0x14c/0x1d8 [nouveau]
[ 40.083116] local_pci_probe+0x48/0xd0
[ 40.083119] pci_device_probe+0xac/0x1b8
[ 40.083121] really_probe+0xc4/0x2a8
[ 40.083124] __driver_probe_device+0x80/0x140
[ 40.083126] driver_probe_device+0x48/0x170
[ 40.083129] __driver_attach+0x9c/0x1b0
[ 40.083132] bus_for_each_dev+0x7c/0xe8
[ 40.083134] driver_attach+0x2c/0x40
[ 40.083136] bus_add_driver+0xec/0x218
[ 40.083138] driver_register+0x68/0x138
[ 40.083141] __pci_register_driver+0x4c/0x60
[ 40.083143] nouveau_drm_init+0x1cc/0xff8 [nouveau]
[ 40.083259] do_one_initcall+0x4c/0x280
[ 40.083262] do_init_module+0x60/0x268
[ 40.083267] load_module+0x1d6c/0x1ef8
[ 40.083270] __do_sys_init_module+0x180/0x200
[ 40.083273] __arm64_sys_init_module+0x24/0x38
[ 40.083276] invoke_syscall+0x50/0x120
[ 40.083278] el0_svc_common.constprop.0+0x48/0xf8
[ 40.083280] do_el0_svc+0x28/0x40
[ 40.083281] el0_svc+0x34/0xf0
[ 40.083284] el0t_64_sync_handler+0xa0/0xe8
[ 40.083287] el0t_64_sync+0x198/0x1a0
[ 40.083937] Unable to create device for framebuffer 0; error -17
[ 40.083953] nouveau 0001:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
[ 42.154952] nouveau 0001:01:00.0: drm: core notifier timeout
[ 44.155062] nouveau 0001:01:00.0: drm: base-0: timeout
The driver shouldn't be trying to force itself to be /class/graphics/fb0 as that is allocated via the DRM core. I'm not even sure how it's managing to do that as do_register_framebuffer should be keeping track of that and know that vc4 has already allocated /dev/fb0.
The timeouts are more important though. kmsprint is identifying my connected monitor correctly, and thinks it's switching to 1080p60, but no apparent output.
@6by9
swiotlb would be using bounce buffers. There should be no need for those, which makes me wonder if you still have dtoverlay=pcie-32bit-dma still loaded from testing other cards. Limiting the PCI window size to 4GB may well cause this sort of error.
I just double-checked and the only difference from stock is I still have dtoverlay=vc4-kms-v3d commented from some earlier testing disabling the internal GPU.
I will switch gears to a Trixie image. Should also solve some compatibility issues I've had with mesa version being newer than expected for some random installers, like those in Pi-Apps, I hope. (Though I'm not sure Pi-Apps will work in Trixie yet. Only one way to find out!)
Edit: I've also added details on my testing setup to the top of this issue, in case someone wants to quickly replicate it. I've found not all eGPU docks are created equal, and the JMT one is the most compatible I've tested.
Seeing as nouveau was blowing up, I've switched to an Intel Arc A380 I've got. I've picked up the extra patch to disable trying to reset the console, otherwise it throws a total wobbler.
Xe is also throwing the same kernel trace for sysfs: cannot create duplicate filename '/class/graphics/fb0'. That implies that something is wrong in the core, and probing either xe or nouveau is trying to evict vc4, which it shouldn't.
I suspect because of that I'm not getting a console on the display, and it also appears to be locking up my SD card. I'm also getting a kernel NULL pointer deref from fbcon_cursor, probably because of the above partially initialised framebuffer. Sorting that is probably worthwhile.
@6by9 - Yeah, that's why I wound up commenting out dtoverlay=vc4-kms-v3d in config.txt; I couldn't get Xe to work at all with the built-in GPU trying to initialize.
Are you saying that could be a bug in the VC4 initialization that Raspberry Pi could fix?
I know in the past, the use case of "there being another graphics accelerator on a Pi" was seen as pretty wildly off course, but seeing how easy it is to use AMD cards now... maybe time to make the iGPU more of a standard iGPU that can easily coexist with external GPUs!
@6by9 - Yeah, that's why I wound up commenting out
dtoverlay=vc4-kms-v3dinconfig.txt; I couldn't get Xe to work at all with the built-in GPU trying to initialize.Are you saying that could be a bug in the VC4 initialization that Raspberry Pi could fix?
I don't think this is a bug in vc4.
The DRM core handles allocating the /dev/fbN nodes for the emulated fbdev devices, so it shouldn't allocate duplicates. That's what the registered_fb array at https://elixir.bootlin.com/linux/v6.17/source/drivers/video/fbdev/core/fbmem.c#L34 is meant to be handling.
There is a call aperture_remove_all_conflicting_devices that vc4 (and others) call to evict the simple_framebuffer that the firmware sets up. I'm wondering if these PCIe drivers are calling something else that tries to unbind the console from under vc4, and that is throwing things off.
I know in the past, the use case of "there being another graphics accelerator on a Pi" was seen as pretty wildly off course, but seeing how easy it is to use AMD cards now... maybe time to make the iGPU more of a standard iGPU that can easily coexist with external GPUs!
We already have multiple DRM devices in play as vc4, v3d, tinydrm, and RP1's DPI, DSI, and VEC are all independent DRM devices. Those all play nicely together, so I'm not sure yet why these PCIe cards are causing grief.
There is an unregister_framebuffer call which would appear to allow detaching of a framebuffer, so that'll be one of my first log points.
That's a little embarrassing. 6.16 and 6.17 have lost a line in the forward port of one of the patches. Without the line at https://github.com/raspberrypi/linux/blob/rpi-6.12.y/drivers/video/fbdev/core/fbmem.c#L421 meant that it always assigned node 0, hence causing the reuse.
Both vc4 and xe are now initialising at the console. Starting labwc complains of not having the iris driver installed, so I may be needing to rebuild mesa anyway.
@6by9 you should be able to use the bookworm-backports version of mesa. This is an alchemist card, right?
@6by9 you should be able to use the bookworm-backports version of mesa. This is an alchemist card, right?
It's an Arc A380, which wiki says is an Alchemist card (Arc 3).
Debian appears to say that iris_dri.so isn't in any of the variants of libgl1-mesa-dri - https://packages.debian.org/bookworm-backports/arm64/libgl1-mesa-dri/filelist. apt-file search iris_dri.so also turns up a blank.
Have they hidden it away in some separate package?
(Nearly built from master anyway)
@6by9 on Bookworm, I followed this process and Xe was happy with both my A750 and B580: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/510#issuecomment-2605722331
So not sure if maybe the A380 could be different?
@6by9 on Bookworm, I followed this process and Xe was happy with both my A750 and B580: #510 (comment)
So not sure if maybe the A380 could be different?
Built fine. I should have read all of your thread as labwc is throwing
labwc: ../src/intel/dev/intel_hwconfig.c:153: process_hwconfig_table: Assertion 'next <= end' failed.
which you'd already reported.
That looks to be a Mesa issue as kmscube is also triggering the same issue.
I haven't looked at what the hwconfig table is that it's trying to parse beyond finding the assert.
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/dev/intel_hwconfig.c?ref_type=heads#L153
So it's parsing some table of stuff, which is a concatenation of
struct hwconfig {
uint32_t key;
uint32_t len;
uint32_t val[];
};
entries. A bit of logging gives me
current 0x557d81de30 end 0x557d81e13c
current 0x557d81de30 key is 00000001 len 1, next 0x557d81de3c
current 0x557d81de3c key is 00000002 len 1, next 0x557d81de48
current 0x557d81de48 key is 00000003 len 1, next 0x557d81de54
current 0x557d81de54 key is 00000004 len 1, next 0x557d81de60
current 0x557d81de60 key is 0000003f len 1, next 0x557d81de6c
current 0x557d81de6c key is 00000040 len 0, next 0x557d81de74
current 0x557d81de74 key is 00000000 len 0, next 0x557d81de7c
current 0x557d81de7c key is 00000000 len 0, next 0x557d81de84
current 0x557d81de84 key is 00000000 len 0, next 0x557d81de8c
current 0x557d81de8c key is 00000000 len 0, next 0x557d81de94
current 0x557d81de94 key is 00000000 len 0, next 0x557d81de9c
current 0x557d81de9c key is 00000000 len 0, next 0x557d81dea4
current 0x557d81dea4 key is 00000000 len 0, next 0x557d81deac
current 0x557d81deac key is 00000000 len 1, next 0x557d81deb8
current 0x557d81deb8 key is 00000001 len 1075904512, next 0x567e05dec0
Key values come from https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/dev/intel_hwconfig_types.h#L17, so 0 is invalid, and that length of 1075904512 is total rubbish. I don't know if that is my card advertising rubbish, or another artifact of cache issues.
Add a break if key is 0, and remove assert(current == end); from the end, and both kmscube and labwc-pi work! labwc-pi has a few artifacts, but kmscube is fine.
......
Take that back. Rebooted with vc4-kms-v3d enabled, and the hwconfig table has changed and fails totally.
I have a suspicion that mesa isn't checking carefully enough that it has the correct DRM node for the Xe card.
- Blacklist v3d but leave vc4 and it doesn't.
- Blacklist vc4 but leave v3d and it works.
- Blacklist vc4, modprobe xe first, then modprobe vc4, and it also works, however the hwconfig list is truncated.
- Blacklist all 3, modprobe xe, then vc4, then v3d, and it works with a correct looking hwconfig list.
I've pushed my in-progress branch to https://github.com/raspberrypi/linux/pull/7072 for backup and CI builds.
Blacklist all 3, modprobe xe, then vc4, then v3d, and it works with a correct looking hwconfig list.
With that config I can start labwc-pi with the desktop spread across both Intel and vc4 display outputs, although there are definite composition issues. This is becoming tasty! Time to try amdgpu seeing as you say that is working better than either of the others.
AMDGPU at this point is near perfect, though I did manage to get a bit of artifacting a few hours in after pushing the system pretty hard with Vulkan