raspberry-pi-pcie-devices
raspberry-pi-pcie-devices copied to clipboard
Test GPU (XFX AMD Radeon RX 460 4GB GDDR5)
The RX 460 is a Polaris era AMD GPU. @Coreforge did a good amount of work getting one running, documented in #6.
We broke out this separate issue since the original RX 550 issue is already a bit long, and we are both testing on a Raspberry Pi 5 now, where this card may have more opportunity to shine.
Note: See later in this issue for more updated instructions, for full accelerated 4K rendering and display output.
Using Coreforge's 6.1.x kernel fork, if you recompile the kernel, you'll end up with a working HDMI output, with working console output:
pi@pi5:~ $ neofetch
_,met$$$$$gg. pi@pi5
,g$$$$$$$$$$$$$$$P. ------
,g$$P" """Y$$.". OS: Debian GNU/Linux 12 (bookworm) aarch64
,$$P' `$$$. Host: Raspberry Pi 5 Model B Rev 1.0
',$$P ,ggs. `$$b: Kernel: 6.1.62-v8_16k+
`d$$' ,$P"' . $$$ Uptime: 8 mins
$$P d$' , $$P Packages: 1604 (dpkg)
$$: $$. - ,d$$' Shell: bash 5.2.15
$$; Y$b._ _,d$P' Resolution: 1920x1080
Y$$. `.`"Y$$$$P"' Terminal: /dev/pts/0
`$$b "-.__ CPU: (4) @ 2.400GHz
`Y$$ GPU: AMD ATI Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X
`Y$$. Memory: 207MiB / 8053MiB
`$$b.
`Y$$b.
`"Y$b._
`"""
I have not been able to get wayfire/lightdm working (it sits there on a blinking cursor screen, and the wireplumber
process seems to get stuck on something under the lightdm
user. Coreforge was running with X11 and seemed to be able to run glmark2
, Minecraft, Portal 1 and 2, and some other games, but currently is running with a PCIe x1 Gen 1 connection.
To use radeontop
:
sudo apt install -y libdrm-dev libncurses-dev libxcb-dri2-0-dev
git clone https://github.com/clbr/radeontop.git
cd radeontop
make
./radeontop
Since I'm having trouble getting into lightdm / wayfire, it's slightly less useful to me right now though :D
If I use raspi-config
to boot to CLI instead of desktop, I try running:
$ wayfire-pi
II 23-11-23 12:51:00.366 - [src/main.cpp:280] Starting wayfire version 0.7.5
II 23-11-23 12:51:00.366 - [libseat] [libseat/backend/seatd.c:64] Could not connect to socket /run/seatd.sock: No such file or directory
II 23-11-23 12:51:00.366 - [libseat] [libseat/libseat.c:76] Backend 'seatd' failed to open seat, skipping
Bus error
And:
$ startx
... get logged errors ...
$ cat /home/pi/.local/share/xorg/Xorg.0.log
...
[ 1476.387] (II) Applying OutputClass "AMDgpu" options to /dev/dri/card2
[ 1476.387] (==) modeset(G0): RGB weight 888
[ 1476.387] (==) modeset(G0): Default visual is TrueColor
[ 1476.387] (II) Loading sub module "glamoregl"
[ 1476.387] (II) LoadModule: "glamoregl"
[ 1476.387] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
[ 1476.387] (II) Module glamoregl: vendor="X.Org Foundation"
[ 1476.387] compiled for 1.21.1.7, module version = 1.0.1
[ 1476.387] ABI class: X.Org ANSI C Emulation, version 0.4
[ 1476.394] (EE)
[ 1476.395] (EE) Backtrace:
[ 1476.397] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0x5555b82fc668]
[ 1476.397] (EE) unw_get_proc_info failed: no unwind info found [-10]
[ 1476.397] (EE)
[ 1476.398] (EE) Bus error at address 0x7ffec3a78080
[ 1476.398] (EE)
Fatal server error:
[ 1476.398] (EE) Caught signal 7 (Bus error). Server aborting
[ 1476.398] (EE)
[ 1476.398] (EE)
I grabbed Coreforge's memcpy library:
wget https://gist.githubusercontent.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359/raw/b4848d1da9fff0cfcf7b601713efac1909e408e8/memcpy_unaligned.c
gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c
sudo mv memcpy.so /usr/local/lib/memcpy.so
sudo nano /etc/ld.so.preload
# Put the following line inside ld.so.preload:
/usr/local/lib/memcpy.so
That got much further with wayfire...
II 23-11-23 12:57:46.203 - [backend/drm/drm.c:1553] Found connector 'DVI-D-1'
II 23-11-23 12:57:46.203 - [backend/drm/drm.c:1614] connector HDMI-A-3: Requesting modeset
II 23-11-23 12:57:46.203 - [src/core/output-layout.cpp:1098] new output: HDMI-A-3
II 23-11-23 12:57:46.203 - [src/core/output-layout.cpp:537] loaded mode auto
II 23-11-23 12:57:46.231 - [backend/drm/drm.c:734] connector HDMI-A-3: Modesetting with 1920x1080 @ 60.000 Hz
(type equals variant: [type: string, value: toplevel] | (type equals variant: [type: string, value: x-or] & focusable equals variant: [type: bool, value: 1]))
type equals variant: [type: string, value: overlay]
false
false
false
app_id equals variant: [type: string, value: Kodi]
(type equals variant: [type: string, value: toplevel] & floating equals variant: [type: bool, value: 1])
II 23-11-23 12:57:46.288 - [backend/drm/drm.c:1502] Scanning DRM connectors on /dev/dri/card1
II 23-11-23 12:57:46.290 - [backend/drm/drm.c:1553] Found connector 'HDMI-A-1'
II 23-11-23 12:57:46.294 - [backend/drm/drm.c:1553] Found connector 'HDMI-A-2'
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "smart-kvm Multifunction USB Device" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "pwr_button" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "vc4-hdmi-0" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "vc4-hdmi-1" to output (not found in this cursor)
EE 23-11-23 12:57:46.296 - [render/allocator/gbm.c:147] gbm_bo_create failed
EE 23-11-23 12:57:46.296 - [render/swapchain.c:109] Failed to allocate buffer
startx
also got further... but I'm not sure what's up, it just ends up not rendering a display through the RX 460 at the point I run it (the system is not locked up however).
[Edit: See the comment later about enabling one of the kernel features so the alignment faults can be fixed.]
On the site now: https://pipci.jeffgeerling.com/cards_gpu/xfx-radeon-rx460-4gb.html
Was there anything in dmesg when running wayfire or x11?
I saw you had some issues compiling in #6. compat_alignment.c
might only get compiled if Kernel Features -> Kernel support for 32bit EL0 -> Fix up misaligned multi-word loads and stores in user space
is enabled (I should probably move the code into a separate file, as that option is disabled by default).
There might be something in newer mesa versions that doesn't get entirely fixed by the memcpy library that's now causing issues with startx as well, as I could get that running before without additional alignment in the kernel. Wayfire was triggering the alignment trap a few times though, so that currently won't work without it. If it's still getting stuck somewhere (with the alignment trap), dmesg will likely get spammed full of essentially the same error over and over again. I'd need at least the Faulting instruction:
and ideally the Load/Store: op0....
line if it's there as well to add the relevant instruction(s). I'm currently just adding them as I encounter issues, as there are quite a lot of load/store instructions on arm64.
My card has 4gb of vram as well, so that's not an issue.
@Coreforge - indeed, after enabling that flag, I can compile (with a number of warnings), rebooting now...
Running wayfire-pi
, while the environment initializes, I see:
[ 40.300504] Alignment fixup
[ 40.300510] Faulting instruction: 0xa9001444
[ 40.300513] Load/Store
[ 40.300515] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[ 40.300517] Storing 8 bytes (pair: 1) to 0x7fff5056016c
[ 40.309090] Alignment fixup
[ 40.309098] Faulting instruction: 0xa9000c22
[ 40.309101] Load/Store
[ 40.309102] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x3
[ 40.309105] Storing 8 bytes (pair: 1) to 0x7fff50568e7c
[ 41.159727] Alignment fixup
[ 41.159732] Faulting instruction: 0xa9001444
[ 41.159735] Load/Store
[ 41.159737] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[ 41.159739] Storing 8 bytes (pair: 1) to 0x7fff5056056c
[ 41.289474] Alignment fixup
[ 41.289486] Faulting instruction: 0xa9001444
[ 41.289490] Load/Store
[ 41.289491] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
When I clicked on the Pi menu, I saw:
[ 41.289494] Storing 8 bytes (pair: 1) to 0x7fff5056096c
[ 132.284968] Alignment fixup
[ 132.284976] Faulting instruction: 0xa9001444
[ 132.284980] Load/Store
[ 132.284983] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
When I opened up Chromium there were maybe 30 or so fixups.
I ran glmark2
and it was spitting out hundreds (?) of fixups per second—certainly a huge number was running through. It seemed to fail during [ideas]
, but it got through a bit before doing so... getting over 2,000 fps during a number of tests.
It did give the GPU some work, though!
Not enough to kick in the internal fans it seems... I think they work :P (the fun of testing used hardware...).
The last time I installed Minecraft on a Pi I just used Pi-Apps — is there a preferred place where you grab it?
Were there a bunch of fixup messages in dmesg without theStoring %d bytes
message when glmark failed? (they might get mixed up a bit, but if it's getting stuck on an instruction, there should be a lot of other messages without that one)
The fans were occasionally spinning up on my card, but not very much, so depending on the fan curve, cooler, and power profile on your card it might just not get warm enough with these loads.
I'm just running minecraft from a technic install where I replaced the native libraries with arm versions and echoed out the actual launch command (since the launcher would otherwise overwrite the libraries again with x86 versions). I think some launchers directly support arm now, but I haven't tried any in a while.
SuperTuxKart ox max settings would probably be a good benchmark for these cards, as it's arm64 native, OpenGL/GLES based, and in the Raspbian repos.
A few notes:
- Fans spin up (slowly then fast briefly) at boot, so at least I know they work :)
- Haven't tried games yet, but SuperTuxKart on high would be good to see, for sure (cc @qwertychouskie)
- For some reason the mouse cursor disappears once I move the mouse at all. Makes using the GUI a bit fun :D (
Does this happen to you too @Coreforge?ah... I see it did) - I've pasted below the last few dozen messages from my last failed glmark2 run:
[ 564.557038] Faulting instruction: 0xf8226865
[ 564.557039] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[ 564.557039] Load/Store
[ 564.557041] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557043] Alignment fixup
[ 564.557044] Faulting instruction: 0xf8226865
[ 564.557046] Load/Store
[ 564.557047] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557049] Alignment fixup
[ 564.557051] Faulting instruction: 0xf8226865
[ 564.557052] Load/Store
[ 564.557053] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557055] Alignment fixup
[ 564.557057] Faulting instruction: 0xf8226865
[ 564.557058] Load/Store
[ 564.557059] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557062] Alignment fixup
[ 564.557063] Faulting instruction: 0xf8226865
[ 564.557064] Load/Store
[ 564.557065] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557068] Alignment fixup
[ 564.557069] Faulting instruction: 0xf8226865
[ 564.557071] Load/Store
[ 564.557072] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557074] Alignment fixup
[ 564.557075] Faulting instruction: 0xf8226865
[ 564.557077] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[ 564.557078] Load/Store
[ 564.557146] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557254] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[ 564.560311] Alignment fixup
[ 564.560317] Faulting instruction: 0xa9001444
[ 564.560320] Load/Store
[ 564.560321] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[ 564.560324] Storing 8 bytes (pair: 1) to 0x7fff3831296c
[ 564.760582] systemd[1]: Started systemd-journald.service - Journal Service.
It was stuck on [ideas]
both times I think. (The blue wireframey wavey one)
This thread may get a little more activity, I just posted a video: You can use external GPUs on the Raspberry Pi 5.
The last time I installed Minecraft on a Pi I just used Pi-Apps — is there a preferred place where you grab it?
@geerlingguy Pi-Apps to install Minecraft (Minecraft Bedrock, Minecraft Java with Prism Launcher, and Minecraft Pi) will work well. All of them are native ARM64. I would love to see Minecraft Java with Prism Launcher running on that setup hopefully with the Simply Optimized modpack or similar.
I need to try this myself at some point (I have an RX 570 8GB which is still a polaris card)
I guess one of those mining riser cards like this
https://www.amazon.com/BEYIMEI-VER010-X-Adapter-Bitcoin-Ethereum/dp/B09BVNSFN8?source=ps-sl-shoppingads-lpcontext&ref_=fplfs&smid=A1BM86NEBPKXB0&th=1
plus the m.2 hat should do the trick
https://pineberrypi.com/products/hat-top-2230-2240-for-rpi5
Regarding the cursor, according to this YouTube comment I could add the environment variable WLR_NO_HARDWARE_CURSORS=1
to use the software renderer, and that would hopefully keep it visible for now :)
The faulting instruction seems to just be a 64bit store (which, since I haven't encountered them, I haven't added yet). I'll hopefully get it added in the next few days.
If PCIe on the Pi 5 is anything like on the LX2160A, the GPU might fall off the bus if the PCIe link rate changes as a power-savings feature. One way to work around that is to set amdgpu.pcie_gen_cap
to 0x10001
for gen1, 0x20002
for gen2, or 0x40004
for gen3. While you're at it, you might try amdgpu.aspm=0
.
There's also a double-negative amdgpu.noretry=0
to enable retries of... something, I don't know exactly what.
I'm also curious if amdgpu's HDMI audio sounds correct on the Pi. On the LX2160A, the audio comes out crackly and garbled.
SuperTuxKart ox max settings would probably be a good benchmark for these cards, as it's arm64 native, OpenGL/GLES based, and in the Raspbian repos.
Another good thing to try is the game Veloren. The launcher for it is available in flatpak: net.veloren.airshipper
It's a game that has support for Vulkan, DX12, and Metal, with native ARM64 versions for Linux and Mac OS.
If you create a world with a fixed non-zero seed, and then Spectate World, I think you should end up with the same map viewed from about the same place, possibly with different weather at a given moment.
seems like AMD is also working on it, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.7-rc3&id=ba0fb4b48c19a2d2380fc16ca4af236a0871d279
Did you talk about it on amd-gfx mail list?
I got the uPCity today (Thanks to the Pineberry people again!), but haven't had too much time to do benchmarks yet.
glmark2
is now also crashing for me on the ideas test with the same instruction, so I guess I just had some old library that got updated when I installed wayfire that did something differently. Minecraft has a similar issue, so until I get that added, I only have this partial run of glmark2. Since I didn't do another run with the old adapter on the same libraries, I don't think the numbers can be compared directly. I saw a GPU utilization of 95% through a good part of the run though, so it doesn't seem like it's getting CPU limited (and toning down the logging should also help with that).
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
GL_VERSION: 4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 342 FrameTime: 2.929 ms
[build] use-vbo=true: FPS: 1799 FrameTime: 0.556 ms
[texture] texture-filter=nearest: FPS: 1824 FrameTime: 0.548 ms
[texture] texture-filter=linear: FPS: 1875 FrameTime: 0.533 ms
[texture] texture-filter=mipmap: FPS: 1866 FrameTime: 0.536 ms
[shading] shading=gouraud: FPS: 1750 FrameTime: 0.572 ms
[shading] shading=blinn-phong-inf: FPS: 1770 FrameTime: 0.565 ms
[shading] shading=phong: FPS: 1741 FrameTime: 0.574 ms
[shading] shading=cel: FPS: 1734 FrameTime: 0.577 ms
[bump] bump-render=high-poly: FPS: 1755 FrameTime: 0.570 ms
[bump] bump-render=normals: FPS: 1810 FrameTime: 0.553 ms
[bump] bump-render=height: FPS: 1821 FrameTime: 0.549 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1411 FrameTime: 0.709 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 508 FrameTime: 1.972 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1434 FrameTime: 0.698 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 523 FrameTime: 1.913 ms
[desktop] effect=shadow:windows=4: FPS: 904 FrameTime: 1.107 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 123 FrameTime: 8.158 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 136 FrameTime: 7.383 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 158 FrameTime: 6.364 ms
I'll try SuperTuxCart as well once I get the things I had running running again.
Hello, one benchmark that I would suggest is GravityMark. It has support for modern features such as ray tracing and I think that it has a native AArch64 build for all major desktop operating systems - certainly for Linux (I have tried it myself on an Ampere eMAG machine).
Geekbench's compute benchmark is another option, covering a different GPU use case.
Geekbench's compute benchmark is another option, covering a different GPU use case.
@volyrique Geekbench compute (at least the vulkan backend) isn't in the linux arm64 geekbench 5/6 builds. It is only in the x86_64 builds. I have been asking jfpoole to add it for a year now but they won't add it for some reason. You can run the geekbench 5/6 x86_64 compute benchmark through box64 though and it does work fully with (probably) minimal overhead. That is what I did to get the geekbench vulkan compute benchmark results here -> https://forums.raspberrypi.com/viewtopic.php?p=2144650#p2140061 on Pi4 and Pi5
That still leaves OpenCL as an option, doesn't it?
glmark2 on mostly updated packages at gen3 speeds:
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
GL_VERSION: 4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 369 FrameTime: 2.717 ms
[build] use-vbo=true: FPS: 2086 FrameTime: 0.480 ms
[texture] texture-filter=nearest: FPS: 2082 FrameTime: 0.480 ms
[texture] texture-filter=linear: FPS: 2078 FrameTime: 0.481 ms
[texture] texture-filter=mipmap: FPS: 2069 FrameTime: 0.484 ms
[shading] shading=gouraud: FPS: 1861 FrameTime: 0.537 ms
[shading] shading=blinn-phong-inf: FPS: 1863 FrameTime: 0.537 ms
[shading] shading=phong: FPS: 1861 FrameTime: 0.537 ms
[shading] shading=cel: FPS: 1862 FrameTime: 0.537 ms
[bump] bump-render=high-poly: FPS: 1868 FrameTime: 0.535 ms
[bump] bump-render=normals: FPS: 2105 FrameTime: 0.475 ms
[bump] bump-render=height: FPS: 2095 FrameTime: 0.477 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1411 FrameTime: 0.709 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 511 FrameTime: 1.959 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1501 FrameTime: 0.666 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 521 FrameTime: 1.923 ms
[desktop] effect=shadow:windows=4: FPS: 904 FrameTime: 1.107 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 130 FrameTime: 7.693 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 139 FrameTime: 7.212 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 165 FrameTime: 6.080 ms
[ideas] speed=duration: FPS: 1698 FrameTime: 0.589 ms
[jellyfish] <default>: FPS: 1015 FrameTime: 0.986 ms
[terrain] <default>: FPS: 126 FrameTime: 7.938 ms
[shadow] <default>: FPS: 1551 FrameTime: 0.645 ms
[refract] <default>: FPS: 165 FrameTime: 6.093 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2040 FrameTime: 0.490 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2032 FrameTime: 0.492 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2038 FrameTime: 0.491 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2034 FrameTime: 0.492 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2040 FrameTime: 0.490 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2039 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2041 FrameTime: 0.490 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2028 FrameTime: 0.493 ms
=======================================================
glmark2 Score: 1463
=======================================================
Most of the benchmarks were running with the GPU at 100%, but some (especially buffer) were heavily CPU bound, with the GPU sitting at only around 20%. That's likely due to the alignment trap getting triggered a lot (it's getting triggered a lot on the other tests too, but likely not nearly as much). Optimizing the trap would likely improve it a bit, but the better option would be finding which library/function is causing those issues and fixing that (since it didn't happen before I updated a bunch of stuff, it should be some system library, probably some part of mesa).
And at gen1 speeds:
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
GL_VERSION: 4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 287 FrameTime: 3.485 ms
[build] use-vbo=true: FPS: 2078 FrameTime: 0.481 ms
[texture] texture-filter=nearest: FPS: 2080 FrameTime: 0.481 ms
[texture] texture-filter=linear: FPS: 2081 FrameTime: 0.481 ms
[texture] texture-filter=mipmap: FPS: 2072 FrameTime: 0.483 ms
[shading] shading=gouraud: FPS: 1863 FrameTime: 0.537 ms
[shading] shading=blinn-phong-inf: FPS: 1862 FrameTime: 0.537 ms
[shading] shading=phong: FPS: 1857 FrameTime: 0.539 ms
[shading] shading=cel: FPS: 1858 FrameTime: 0.538 ms
[bump] bump-render=high-poly: FPS: 1864 FrameTime: 0.537 ms
[bump] bump-render=normals: FPS: 2097 FrameTime: 0.477 ms
[bump] bump-render=height: FPS: 2086 FrameTime: 0.479 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1401 FrameTime: 0.714 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 509 FrameTime: 1.967 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1506 FrameTime: 0.664 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 514 FrameTime: 1.949 ms
[desktop] effect=shadow:windows=4: FPS: 884 FrameTime: 1.131 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 81 FrameTime: 12.470 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 136 FrameTime: 7.406 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 100 FrameTime: 10.057 ms
[ideas] speed=duration: FPS: 1535 FrameTime: 0.652 ms
[jellyfish] <default>: FPS: 1012 FrameTime: 0.989 ms
[terrain] <default>: FPS: 126 FrameTime: 7.964 ms
[shadow] <default>: FPS: 1549 FrameTime: 0.646 ms
[refract] <default>: FPS: 165 FrameTime: 6.097 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2040 FrameTime: 0.490 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2037 FrameTime: 0.491 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2035 FrameTime: 0.491 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2038 FrameTime: 0.491 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2037 FrameTime: 0.491 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2038 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2035 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2039 FrameTime: 0.491 ms
=======================================================
glmark2 Score: 1450
=======================================================
The GPU utilization was at 100% through most of the run as well. Most of the scores are the same, but some benchmarks (mainly mapped buffer, which transfers a lot of data) were affected quite a bit.
SuperTuxKart was CPU limited, but the GPU was at about 80% rendering at 3840x2160. Since I removed most of the log output, I don't know how much the alignment trap contributed to the CPU load, but it probably had a bit of an impact.
[verbose ] profile: Number of frames: 8420 time 70.956001, Average FPS: 118.665085
[verbose ] profile: Average # drawn nodes 0.000000 k
[verbose ] profile: Average # culled nodes: 0.000000 k
[verbose ] profile: Average # solid nodes: 0.000000 k
[verbose ] profile: Average # transparent nodes: 0.000000
[verbose ] profile: Average # transp. effect nodes: 0.000000
[verbose ] profile: name start_position end_position time average_speed top_speed skid_time rescue_time rescue_count brake_count explosion_time explosion_count bonus_count banana_count small_nitro_count large_nitro_count bubblegum_count
[verbose ] profile: gavroche Skidding 1 3 61.3454 16.1741 20 0 0 0 315 0 0 1 0 2 0 0 358
[verbose ] profile: puffy Skidding 2 1 58.9141 16.8416 21.3464 0 0 0 482 0 0 6 0 1 0 0 320
[verbose ] profile: konqi Skidding 3 4 65.1309 15.234 15.8408 0 0 0 486 0 0 3 0 4 0 0 361
[verbose ] profile: tux Skidding 4 2 60.2146 16.4778 25.8022 0 0 0 399 0 0 3 0 1 0 0 574
[verbose ] profile: min 58.914093 max 65.130867 av 61.401222
[verbose ] profile:
[verbose ] profile: name Strt End Time AvSp Top Skid Resc Rsc Brake Expl Exp Itm Ban SNitLNit Bub Off Energy
[verbose ] profile: Skidding 1 3 61.35 16.17 20.00 0.00 0.00 0 315 0.00 0 1 0 2 0 0 358 0.00
[verbose ] profile: Skidding 2 1 58.91 16.84 21.35 0.00 0.00 0 482 0.00 0 6 0 1 0 0 320 1.00
[verbose ] profile: Skidding 3 4 65.13 15.23 15.84 0.00 0.00 0 486 0.00 0 3 0 4 0 0 361 4.00
[verbose ] profile: Skidding 4 2 60.21 16.48 25.80 0.00 0.00 0 399 0.00 0 3 0 1 0 0 574 1.00
[verbose ] profile: ---------------------------------------------------------------------------------------------------
[verbose ] profile: Skidding +0 61.4012 0.00 0.00 0 1682 0.00 0 13 0 8 0 0 1613 6.00
OpenCL based applications unfortunately are likely rather difficult to run with this card, at least from my experience. It worked fine in my desktop, but I only got it to work with the proprietary OpenCL driver (and I think ROCm dropped support for polaris?)
There are some more instructions left that I know cause issues with some unity games, but a lot of things should work again now.
@Coreforge Awesome, thanks! Could you also try running it windowed inside the wayfire-pi
environment too? (If you get a chance).
I can try, though I had some missing libs last time I tried 3D stuff inside wayfire.
Just for a frame of reference, on my RX 570 on x86_64 desktop I get ~3200 points at the same resolution. So I think you are getting the full performance in that benchmark since the RX 460 is supposed to be a bit less than 1/2 as powerful as the RX 570. Do you have a reference test of that RX 460 on a desktop x86_64 computer?
Also just a suggestion, I imagine you have trouble using the raspberry pi patched and built chromium with anything other than the pi4/pi5 videocore gpus. You will probably have better luck with vanilla chromium either from building chromium from source, using the chromium flatpak or snap (which are vanilla chromium without any notable patches), or using my chromium debs that you can find here -> https://github.com/theofficialgman/testing/releases/tag/gmans-releases (latest version chromium-browser-stable_119.0.6045.199-1_arm64.deb) (note I don't suggest using my chromium debs all the time since I have specifically patched out libVPX support to use ffmpeg for vp9 hardware decoding on nvidia tegra systems, I use these debs to repackage chromium for Switchroot Nintendo Switch linux distros).
I didn't do benchmarks on x86 with the card, but since it was showing 100% utilization in most tests, that should be the full performance in those. I haven't run any browsers so far, but I'll keep that in mind if I do run into any issues.
Fixing SIMD instructions seems to be more complicated, as the SIMD registers don't get saved as part of the pt_regs
struct. I'm currently using fpsimd_save_state()
to save them to a temporary location and just assume that nothing in between the exception and that part in the handler changes those registers (which should be the case), but even though unity games work somewhat now, they segfault sometimes, and not always at the same point.
The issues always occur in the UnityGfxDeviceW
thread, which isn't a big surprise, but I'm not exactly sure what's causing them, I need to do more debugging on that. It does look like I'm not properly fixing some instructions though, as there are graphical glitches in the menu of Getting Over It, and it's even worse in-game. I have it narrowed down to a 128bit unsigned immediate str
, but other than maybe not getting the correct data to write to memory (if fpsimd_save_state()
doesn't work properly for this), I'm not really sure what's causing these problems.
What I was doing before was definitely not working.
I changed to read the vector registers from current->thread.uw.fpsimd_state.vregs
, and call kernel_neon_begin()
before reading the data to ensure it's saved (and then kernel_neon_end()
afterwards), and it's certainly looking better (at least the main menu of Getting Over It is), but Getting Over It still crashes in the same place. Looking around with GDB, I couldn't find much obvious either. It is related to SIMD instruction fixes, as I saw a null pointer dereference when I changed it to just write 0 for all SIMD fixups, but I'm not sure where exactly I'm doing something wrong. I pushed my current code again though.
I tried a few more games that I could run as well: DOOM 2016 with OpenGL can at least get to the menu (I haven't tried further), but the performance isn't too great. (CPU bound, might also be affected by the logging going on). Both DOOM 2016 and DOOM Eternal don't launch with Vulkan, they complain about being unable to initialize vulkan (although I know both work with this card, and vulkan works fine on the pi too). The Talos Principle runs, although performance isn't too great here either. It's also CPU bound, and it wasn't spamming dmesg (though it could still be triggering the alignment trap a lot). This game uses Vulkan.
I also did two runs of GravityMark, one on vulkan and one on opengl. The vulkan one ran and looked just fine, the opengl one was triggering the alignment trap a lot again with SIMD instructions, and the earth was flickering or not there at all for parts of the run.
I saw that there were (likely) two instructions causing problems:
[ 6964.423429] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68438038
[ 6964.457330] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6842529c
[ 6964.491163] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f68401f9c
[ 6964.491173] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68401fb8
[ 6964.526794] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f6840cc1c
[ 6964.526804] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6840cc38
[ 6964.560762] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f6841981c
[ 6964.560773] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68419838
[ 6964.594807] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6842869c
[ 6964.628703] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f684353f4
[ 6964.664459] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f68411f9c
I've been mostly focused on the 128bit stuff for now (as one of the instructions I saw causing a lot of issues was a 128bit str), so I suspect the 64bit one might not get handled properly.
I also gave monado a try (I wasn't expecting anything too special, as it mainly just needs vulkan, which works fine), and it works fine, though I haven't been able to get anything other than hello_xr to run.
Apparently I forgot to read the second data register for SIMD instructions, so instructions stp qx, qx, [xx]
only stored the first register, and not the second one. I also added some code to keep track of which instructions have already been handled at some point and which haven't, to make it easier to find potentially bad ones.
I added a bunch of instruction tests to my gpu memory access tests (all instructions I've seen get handled when launching Getting Over It or The Long Dark), which helped me find the stp
issue. All instructions being tested are now being handled correctly. However, Unity games are still segfaulting, and I don't know why. I might try rendering them with llvmpipe instead to make sure it's actually related to the GPU, although I think it is.