raspberry-pi-pcie-devices icon indicating copy to clipboard operation
raspberry-pi-pcie-devices copied to clipboard

Test GPU (XFX AMD Radeon RX 460 4GB GDDR5)

Open geerlingguy opened this issue 1 year ago • 101 comments

The RX 460 is a Polaris era AMD GPU. @Coreforge did a good amount of work getting one running, documented in #6.

285287295-68aa718d-2c0f-4c01-8012-80dd53d2debc

We broke out this separate issue since the original RX 550 issue is already a bit long, and we are both testing on a Raspberry Pi 5 now, where this card may have more opportunity to shine.

Note: See later in this issue for more updated instructions, for full accelerated 4K rendering and display output.

Using Coreforge's 6.1.x kernel fork, if you recompile the kernel, you'll end up with a working HDMI output, with working console output:

pi@pi5:~ $ neofetch
       _,met$$$$$gg.          pi@pi5 
    ,g$$$$$$$$$$$$$$$P.       ------ 
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux 12 (bookworm) aarch64 
 ,$$P'              `$$$.     Host: Raspberry Pi 5 Model B Rev 1.0 
',$$P       ,ggs.     `$$b:   Kernel: 6.1.62-v8_16k+ 
`d$$'     ,$P"'   .    $$$    Uptime: 8 mins 
 $$P      d$'     ,    $$P    Packages: 1604 (dpkg) 
 $$:      $$.   -    ,d$$'    Shell: bash 5.2.15 
 $$;      Y$b._   _,d$P'      Resolution: 1920x1080 
 Y$$.    `.`"Y$$$$P"'         Terminal: /dev/pts/0 
 `$$b      "-.__              CPU: (4) @ 2.400GHz 
  `Y$$                        GPU: AMD ATI Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X 
   `Y$$.                      Memory: 207MiB / 8053MiB 
     `$$b.
       `Y$$b.                                         
          `"Y$b._                                     
              `"""

I have not been able to get wayfire/lightdm working (it sits there on a blinking cursor screen, and the wireplumber process seems to get stuck on something under the lightdm user. Coreforge was running with X11 and seemed to be able to run glmark2, Minecraft, Portal 1 and 2, and some other games, but currently is running with a PCIe x1 Gen 1 connection.

geerlingguy avatar Nov 23 '23 18:11 geerlingguy

To use radeontop:

sudo apt install -y libdrm-dev libncurses-dev libxcb-dri2-0-dev
git clone https://github.com/clbr/radeontop.git
cd radeontop
make
./radeontop

Since I'm having trouble getting into lightdm / wayfire, it's slightly less useful to me right now though :D

geerlingguy avatar Nov 23 '23 18:11 geerlingguy

If I use raspi-config to boot to CLI instead of desktop, I try running:

$ wayfire-pi
II 23-11-23 12:51:00.366 - [src/main.cpp:280] Starting wayfire version 0.7.5
II 23-11-23 12:51:00.366 - [libseat] [libseat/backend/seatd.c:64] Could not connect to socket /run/seatd.sock: No such file or directory
II 23-11-23 12:51:00.366 - [libseat] [libseat/libseat.c:76] Backend 'seatd' failed to open seat, skipping
Bus error

And:

$ startx
... get logged errors ...

$ cat /home/pi/.local/share/xorg/Xorg.0.log 
...
[  1476.387] (II) Applying OutputClass "AMDgpu" options to /dev/dri/card2
[  1476.387] (==) modeset(G0): RGB weight 888
[  1476.387] (==) modeset(G0): Default visual is TrueColor
[  1476.387] (II) Loading sub module "glamoregl"
[  1476.387] (II) LoadModule: "glamoregl"
[  1476.387] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
[  1476.387] (II) Module glamoregl: vendor="X.Org Foundation"
[  1476.387] 	compiled for 1.21.1.7, module version = 1.0.1
[  1476.387] 	ABI class: X.Org ANSI C Emulation, version 0.4
[  1476.394] (EE) 
[  1476.395] (EE) Backtrace:
[  1476.397] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0x5555b82fc668]
[  1476.397] (EE) unw_get_proc_info failed: no unwind info found [-10]
[  1476.397] (EE) 
[  1476.398] (EE) Bus error at address 0x7ffec3a78080
[  1476.398] (EE) 
Fatal server error:
[  1476.398] (EE) Caught signal 7 (Bus error). Server aborting
[  1476.398] (EE) 
[  1476.398] (EE)

geerlingguy avatar Nov 23 '23 18:11 geerlingguy

I grabbed Coreforge's memcpy library:

wget https://gist.githubusercontent.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359/raw/b4848d1da9fff0cfcf7b601713efac1909e408e8/memcpy_unaligned.c

gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c
sudo mv memcpy.so /usr/local/lib/memcpy.so
sudo nano /etc/ld.so.preload

# Put the following line inside ld.so.preload:
/usr/local/lib/memcpy.so

That got much further with wayfire...

II 23-11-23 12:57:46.203 - [backend/drm/drm.c:1553] Found connector 'DVI-D-1'
II 23-11-23 12:57:46.203 - [backend/drm/drm.c:1614] connector HDMI-A-3: Requesting modeset
II 23-11-23 12:57:46.203 - [src/core/output-layout.cpp:1098] new output: HDMI-A-3
II 23-11-23 12:57:46.203 - [src/core/output-layout.cpp:537] loaded mode auto
II 23-11-23 12:57:46.231 - [backend/drm/drm.c:734] connector HDMI-A-3: Modesetting with 1920x1080 @ 60.000 Hz
(type equals variant: [type: string, value: toplevel] | (type equals variant: [type: string, value: x-or] & focusable equals variant: [type: bool, value: 1]))
type equals variant: [type: string, value: overlay]
false
false
false
app_id equals variant: [type: string, value: Kodi]
(type equals variant: [type: string, value: toplevel] & floating equals variant: [type: bool, value: 1])
II 23-11-23 12:57:46.288 - [backend/drm/drm.c:1502] Scanning DRM connectors on /dev/dri/card1
II 23-11-23 12:57:46.290 - [backend/drm/drm.c:1553] Found connector 'HDMI-A-1'
II 23-11-23 12:57:46.294 - [backend/drm/drm.c:1553] Found connector 'HDMI-A-2'
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "smart-kvm Multifunction USB Device" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "pwr_button" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "vc4-hdmi-0" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "vc4-hdmi-1" to output (not found in this cursor)
EE 23-11-23 12:57:46.296 - [render/allocator/gbm.c:147] gbm_bo_create failed
EE 23-11-23 12:57:46.296 - [render/swapchain.c:109] Failed to allocate buffer

startx also got further... but I'm not sure what's up, it just ends up not rendering a display through the RX 460 at the point I run it (the system is not locked up however).

[Edit: See the comment later about enabling one of the kernel features so the alignment faults can be fixed.]

geerlingguy avatar Nov 23 '23 19:11 geerlingguy

On the site now: https://pipci.jeffgeerling.com/cards_gpu/xfx-radeon-rx460-4gb.html

geerlingguy avatar Nov 23 '23 19:11 geerlingguy

Was there anything in dmesg when running wayfire or x11? I saw you had some issues compiling in #6. compat_alignment.c might only get compiled if Kernel Features -> Kernel support for 32bit EL0 -> Fix up misaligned multi-word loads and stores in user space is enabled (I should probably move the code into a separate file, as that option is disabled by default). There might be something in newer mesa versions that doesn't get entirely fixed by the memcpy library that's now causing issues with startx as well, as I could get that running before without additional alignment in the kernel. Wayfire was triggering the alignment trap a few times though, so that currently won't work without it. If it's still getting stuck somewhere (with the alignment trap), dmesg will likely get spammed full of essentially the same error over and over again. I'd need at least the Faulting instruction: and ideally the Load/Store: op0.... line if it's there as well to add the relevant instruction(s). I'm currently just adding them as I encounter issues, as there are quite a lot of load/store instructions on arm64. My card has 4gb of vram as well, so that's not an issue.

Coreforge avatar Nov 23 '23 23:11 Coreforge

@Coreforge - indeed, after enabling that flag, I can compile (with a number of warnings), rebooting now...

Running wayfire-pi, while the environment initializes, I see:

[   40.300504] Alignment fixup
[   40.300510] Faulting instruction: 0xa9001444
[   40.300513] Load/Store
[   40.300515] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[   40.300517] Storing 8 bytes (pair: 1) to 0x7fff5056016c
[   40.309090] Alignment fixup
[   40.309098] Faulting instruction: 0xa9000c22
[   40.309101] Load/Store
[   40.309102] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x3
[   40.309105] Storing 8 bytes (pair: 1) to 0x7fff50568e7c
[   41.159727] Alignment fixup
[   41.159732] Faulting instruction: 0xa9001444
[   41.159735] Load/Store
[   41.159737] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[   41.159739] Storing 8 bytes (pair: 1) to 0x7fff5056056c
[   41.289474] Alignment fixup
[   41.289486] Faulting instruction: 0xa9001444
[   41.289490] Load/Store
[   41.289491] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1

wayfire-pi

When I clicked on the Pi menu, I saw:

[   41.289494] Storing 8 bytes (pair: 1) to 0x7fff5056096c
[  132.284968] Alignment fixup
[  132.284976] Faulting instruction: 0xa9001444
[  132.284980] Load/Store
[  132.284983] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1

When I opened up Chromium there were maybe 30 or so fixups.

I ran glmark2 and it was spitting out hundreds (?) of fixups per second—certainly a huge number was running through. It seemed to fail during [ideas], but it got through a bit before doing so... getting over 2,000 fps during a number of tests.

glmark2

It did give the GPU some work, though!

Screenshot 2023-11-27 at 4 15 46 PM

Not enough to kick in the internal fans it seems... I think they work :P (the fun of testing used hardware...).

The last time I installed Minecraft on a Pi I just used Pi-Apps — is there a preferred place where you grab it?

geerlingguy avatar Nov 27 '23 22:11 geerlingguy

Were there a bunch of fixup messages in dmesg without theStoring %d bytes message when glmark failed? (they might get mixed up a bit, but if it's getting stuck on an instruction, there should be a lot of other messages without that one) The fans were occasionally spinning up on my card, but not very much, so depending on the fan curve, cooler, and power profile on your card it might just not get warm enough with these loads. I'm just running minecraft from a technic install where I replaced the native libraries with arm versions and echoed out the actual launch command (since the launcher would otherwise overwrite the libraries again with x86 versions). I think some launchers directly support arm now, but I haven't tried any in a while.

Coreforge avatar Nov 28 '23 08:11 Coreforge

SuperTuxKart ox max settings would probably be a good benchmark for these cards, as it's arm64 native, OpenGL/GLES based, and in the Raspbian repos.

qwertychouskie avatar Nov 28 '23 19:11 qwertychouskie

A few notes:

  • Fans spin up (slowly then fast briefly) at boot, so at least I know they work :)
  • Haven't tried games yet, but SuperTuxKart on high would be good to see, for sure (cc @qwertychouskie)
  • For some reason the mouse cursor disappears once I move the mouse at all. Makes using the GUI a bit fun :D (Does this happen to you too @Coreforge? ah... I see it did)
  • I've pasted below the last few dozen messages from my last failed glmark2 run:
[  564.557038] Faulting instruction: 0xf8226865
[  564.557039] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[  564.557039] Load/Store
[  564.557041] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[  564.557043] Alignment fixup
[  564.557044] Faulting instruction: 0xf8226865
[  564.557046] Load/Store
[  564.557047] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[  564.557049] Alignment fixup
[  564.557051] Faulting instruction: 0xf8226865
[  564.557052] Load/Store
[  564.557053] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[  564.557055] Alignment fixup
[  564.557057] Faulting instruction: 0xf8226865
[  564.557058] Load/Store
[  564.557059] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[  564.557062] Alignment fixup
[  564.557063] Faulting instruction: 0xf8226865
[  564.557064] Load/Store
[  564.557065] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[  564.557068] Alignment fixup
[  564.557069] Faulting instruction: 0xf8226865
[  564.557071] Load/Store
[  564.557072] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[  564.557074] Alignment fixup
[  564.557075] Faulting instruction: 0xf8226865
[  564.557077] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[  564.557078] Load/Store
[  564.557146] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[  564.557254] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[  564.560311] Alignment fixup
[  564.560317] Faulting instruction: 0xa9001444
[  564.560320] Load/Store
[  564.560321] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[  564.560324] Storing 8 bytes (pair: 1) to 0x7fff3831296c
[  564.760582] systemd[1]: Started systemd-journald.service - Journal Service.

It was stuck on [ideas] both times I think. (The blue wireframey wavey one)

geerlingguy avatar Nov 28 '23 22:11 geerlingguy

This thread may get a little more activity, I just posted a video: You can use external GPUs on the Raspberry Pi 5.

geerlingguy avatar Nov 29 '23 00:11 geerlingguy

The last time I installed Minecraft on a Pi I just used Pi-Apps — is there a preferred place where you grab it?

@geerlingguy Pi-Apps to install Minecraft (Minecraft Bedrock, Minecraft Java with Prism Launcher, and Minecraft Pi) will work well. All of them are native ARM64. I would love to see Minecraft Java with Prism Launcher running on that setup hopefully with the Simply Optimized modpack or similar.

theofficialgman avatar Nov 29 '23 00:11 theofficialgman

I need to try this myself at some point (I have an RX 570 8GB which is still a polaris card)

I guess one of those mining riser cards like this

https://www.amazon.com/BEYIMEI-VER010-X-Adapter-Bitcoin-Ethereum/dp/B09BVNSFN8?source=ps-sl-shoppingads-lpcontext&ref_=fplfs&smid=A1BM86NEBPKXB0&th=1

plus the m.2 hat should do the trick

https://pineberrypi.com/products/hat-top-2230-2240-for-rpi5

theofficialgman avatar Nov 29 '23 01:11 theofficialgman

Regarding the cursor, according to this YouTube comment I could add the environment variable WLR_NO_HARDWARE_CURSORS=1 to use the software renderer, and that would hopefully keep it visible for now :)

geerlingguy avatar Nov 29 '23 03:11 geerlingguy

The faulting instruction seems to just be a 64bit store (which, since I haven't encountered them, I haven't added yet). I'll hopefully get it added in the next few days.

Coreforge avatar Nov 29 '23 06:11 Coreforge

If PCIe on the Pi 5 is anything like on the LX2160A, the GPU might fall off the bus if the PCIe link rate changes as a power-savings feature. One way to work around that is to set amdgpu.pcie_gen_cap to 0x10001 for gen1, 0x20002 for gen2, or 0x40004 for gen3. While you're at it, you might try amdgpu.aspm=0.

There's also a double-negative amdgpu.noretry=0 to enable retries of... something, I don't know exactly what.

I'm also curious if amdgpu's HDMI audio sounds correct on the Pi. On the LX2160A, the audio comes out crackly and garbled.

DanaGoyette avatar Nov 29 '23 07:11 DanaGoyette

SuperTuxKart ox max settings would probably be a good benchmark for these cards, as it's arm64 native, OpenGL/GLES based, and in the Raspbian repos.

Another good thing to try is the game Veloren. The launcher for it is available in flatpak: net.veloren.airshipper It's a game that has support for Vulkan, DX12, and Metal, with native ARM64 versions for Linux and Mac OS.

If you create a world with a fixed non-zero seed, and then Spectate World, I think you should end up with the same map viewed from about the same place, possibly with different weather at a given moment.

DanaGoyette avatar Nov 29 '23 07:11 DanaGoyette

seems like AMD is also working on it, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.7-rc3&id=ba0fb4b48c19a2d2380fc16ca4af236a0871d279

Did you talk about it on amd-gfx mail list?

stalkerg avatar Nov 29 '23 08:11 stalkerg

I got the uPCity today (Thanks to the Pineberry people again!), but haven't had too much time to do benchmarks yet. glmark2 is now also crashing for me on the ideas test with the same instruction, so I guess I just had some old library that got updated when I installed wayfire that did something differently. Minecraft has a similar issue, so until I get that added, I only have this partial run of glmark2. Since I didn't do another run with the old adapter on the same libraries, I don't think the numbers can be compared directly. I saw a GPU utilization of 95% through a good part of the run though, so it doesn't seem like it's getting CPU limited (and toning down the logging should also help with that).

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
    GL_VERSION:     4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 342 FrameTime: 2.929 ms
[build] use-vbo=true: FPS: 1799 FrameTime: 0.556 ms
[texture] texture-filter=nearest: FPS: 1824 FrameTime: 0.548 ms
[texture] texture-filter=linear: FPS: 1875 FrameTime: 0.533 ms
[texture] texture-filter=mipmap: FPS: 1866 FrameTime: 0.536 ms
[shading] shading=gouraud: FPS: 1750 FrameTime: 0.572 ms
[shading] shading=blinn-phong-inf: FPS: 1770 FrameTime: 0.565 ms
[shading] shading=phong: FPS: 1741 FrameTime: 0.574 ms
[shading] shading=cel: FPS: 1734 FrameTime: 0.577 ms
[bump] bump-render=high-poly: FPS: 1755 FrameTime: 0.570 ms
[bump] bump-render=normals: FPS: 1810 FrameTime: 0.553 ms
[bump] bump-render=height: FPS: 1821 FrameTime: 0.549 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1411 FrameTime: 0.709 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 508 FrameTime: 1.972 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1434 FrameTime: 0.698 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 523 FrameTime: 1.913 ms
[desktop] effect=shadow:windows=4: FPS: 904 FrameTime: 1.107 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 123 FrameTime: 8.158 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 136 FrameTime: 7.383 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 158 FrameTime: 6.364 ms

I'll try SuperTuxCart as well once I get the things I had running running again.

Coreforge avatar Nov 29 '23 21:11 Coreforge

Hello, one benchmark that I would suggest is GravityMark. It has support for modern features such as ray tracing and I think that it has a native AArch64 build for all major desktop operating systems - certainly for Linux (I have tried it myself on an Ampere eMAG machine).

Geekbench's compute benchmark is another option, covering a different GPU use case.

volyrique avatar Nov 29 '23 21:11 volyrique

Geekbench's compute benchmark is another option, covering a different GPU use case.

@volyrique Geekbench compute (at least the vulkan backend) isn't in the linux arm64 geekbench 5/6 builds. It is only in the x86_64 builds. I have been asking jfpoole to add it for a year now but they won't add it for some reason. You can run the geekbench 5/6 x86_64 compute benchmark through box64 though and it does work fully with (probably) minimal overhead. That is what I did to get the geekbench vulkan compute benchmark results here -> https://forums.raspberrypi.com/viewtopic.php?p=2144650#p2140061 on Pi4 and Pi5

theofficialgman avatar Nov 29 '23 23:11 theofficialgman

That still leaves OpenCL as an option, doesn't it?

volyrique avatar Dec 01 '23 01:12 volyrique

glmark2 on mostly updated packages at gen3 speeds:

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
    GL_VERSION:     4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 369 FrameTime: 2.717 ms
[build] use-vbo=true: FPS: 2086 FrameTime: 0.480 ms
[texture] texture-filter=nearest: FPS: 2082 FrameTime: 0.480 ms
[texture] texture-filter=linear: FPS: 2078 FrameTime: 0.481 ms
[texture] texture-filter=mipmap: FPS: 2069 FrameTime: 0.484 ms
[shading] shading=gouraud: FPS: 1861 FrameTime: 0.537 ms
[shading] shading=blinn-phong-inf: FPS: 1863 FrameTime: 0.537 ms
[shading] shading=phong: FPS: 1861 FrameTime: 0.537 ms
[shading] shading=cel: FPS: 1862 FrameTime: 0.537 ms
[bump] bump-render=high-poly: FPS: 1868 FrameTime: 0.535 ms
[bump] bump-render=normals: FPS: 2105 FrameTime: 0.475 ms
[bump] bump-render=height: FPS: 2095 FrameTime: 0.477 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1411 FrameTime: 0.709 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 511 FrameTime: 1.959 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1501 FrameTime: 0.666 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 521 FrameTime: 1.923 ms
[desktop] effect=shadow:windows=4: FPS: 904 FrameTime: 1.107 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 130 FrameTime: 7.693 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 139 FrameTime: 7.212 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 165 FrameTime: 6.080 ms
[ideas] speed=duration: FPS: 1698 FrameTime: 0.589 ms
[jellyfish] <default>: FPS: 1015 FrameTime: 0.986 ms
[terrain] <default>: FPS: 126 FrameTime: 7.938 ms
[shadow] <default>: FPS: 1551 FrameTime: 0.645 ms
[refract] <default>: FPS: 165 FrameTime: 6.093 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2040 FrameTime: 0.490 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2032 FrameTime: 0.492 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2038 FrameTime: 0.491 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2034 FrameTime: 0.492 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2040 FrameTime: 0.490 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2039 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2041 FrameTime: 0.490 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2028 FrameTime: 0.493 ms
=======================================================
                                  glmark2 Score: 1463 
=======================================================

Most of the benchmarks were running with the GPU at 100%, but some (especially buffer) were heavily CPU bound, with the GPU sitting at only around 20%. That's likely due to the alignment trap getting triggered a lot (it's getting triggered a lot on the other tests too, but likely not nearly as much). Optimizing the trap would likely improve it a bit, but the better option would be finding which library/function is causing those issues and fixing that (since it didn't happen before I updated a bunch of stuff, it should be some system library, probably some part of mesa).

And at gen1 speeds:

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
    GL_VERSION:     4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 287 FrameTime: 3.485 ms
[build] use-vbo=true: FPS: 2078 FrameTime: 0.481 ms
[texture] texture-filter=nearest: FPS: 2080 FrameTime: 0.481 ms
[texture] texture-filter=linear: FPS: 2081 FrameTime: 0.481 ms
[texture] texture-filter=mipmap: FPS: 2072 FrameTime: 0.483 ms
[shading] shading=gouraud: FPS: 1863 FrameTime: 0.537 ms
[shading] shading=blinn-phong-inf: FPS: 1862 FrameTime: 0.537 ms
[shading] shading=phong: FPS: 1857 FrameTime: 0.539 ms
[shading] shading=cel: FPS: 1858 FrameTime: 0.538 ms
[bump] bump-render=high-poly: FPS: 1864 FrameTime: 0.537 ms
[bump] bump-render=normals: FPS: 2097 FrameTime: 0.477 ms
[bump] bump-render=height: FPS: 2086 FrameTime: 0.479 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1401 FrameTime: 0.714 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 509 FrameTime: 1.967 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1506 FrameTime: 0.664 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 514 FrameTime: 1.949 ms
[desktop] effect=shadow:windows=4: FPS: 884 FrameTime: 1.131 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 81 FrameTime: 12.470 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 136 FrameTime: 7.406 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 100 FrameTime: 10.057 ms
[ideas] speed=duration: FPS: 1535 FrameTime: 0.652 ms
[jellyfish] <default>: FPS: 1012 FrameTime: 0.989 ms
[terrain] <default>: FPS: 126 FrameTime: 7.964 ms
[shadow] <default>: FPS: 1549 FrameTime: 0.646 ms
[refract] <default>: FPS: 165 FrameTime: 6.097 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2040 FrameTime: 0.490 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2037 FrameTime: 0.491 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2035 FrameTime: 0.491 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2038 FrameTime: 0.491 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2037 FrameTime: 0.491 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2038 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2035 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2039 FrameTime: 0.491 ms
=======================================================
                                  glmark2 Score: 1450 
=======================================================

The GPU utilization was at 100% through most of the run as well. Most of the scores are the same, but some benchmarks (mainly mapped buffer, which transfers a lot of data) were affected quite a bit.

SuperTuxKart was CPU limited, but the GPU was at about 80% rendering at 3840x2160. Since I removed most of the log output, I don't know how much the alignment trap contributed to the CPU load, but it probably had a bit of an impact.

[verbose  ] profile: Number of frames: 8420 time 70.956001, Average FPS: 118.665085
[verbose  ] profile: Average # drawn nodes           0.000000 k
[verbose  ] profile: Average # culled nodes:         0.000000 k
[verbose  ] profile: Average # solid nodes:          0.000000 k
[verbose  ] profile: Average # transparent nodes:    0.000000
[verbose  ] profile: Average # transp. effect nodes: 0.000000
[verbose  ] profile: name start_position end_position time average_speed top_speed skid_time rescue_time rescue_count brake_count explosion_time explosion_count bonus_count banana_count small_nitro_count large_nitro_count bubblegum_count
[verbose  ] profile: gavroche Skidding 1 3 61.3454 16.1741 20 0 0 0 315 0 0 1 0 2 0 0 358 
[verbose  ] profile: puffy Skidding 2 1 58.9141 16.8416 21.3464 0 0 0 482 0 0 6 0 1 0 0 320 
[verbose  ] profile: konqi Skidding 3 4 65.1309 15.234 15.8408 0 0 0 486 0 0 3 0 4 0 0 361 
[verbose  ] profile: tux Skidding 4 2 60.2146 16.4778 25.8022 0 0 0 399 0 0 3 0 1 0 0 574 
[verbose  ] profile: min 58.914093  max 65.130867  av 61.401222

[verbose  ] profile: 
[verbose  ] profile: name     Strt End  Time    AvSp  Top   Skid  Resc Rsc Brake Expl Exp Itm Ban SNitLNit Bub Off Energy
[verbose  ] profile: Skidding    1   3  61.35 16.17 20.00   0.00 0.00   0   315 0.00   0   1   0   2   0   0   358 0.00
[verbose  ] profile: Skidding    2   1  58.91 16.84 21.35   0.00 0.00   0   482 0.00   0   6   0   1   0   0   320 1.00
[verbose  ] profile: Skidding    3   4  65.13 15.23 15.84   0.00 0.00   0   486 0.00   0   3   0   4   0   0   361 4.00
[verbose  ] profile: Skidding    4   2  60.21 16.48 25.80   0.00 0.00   0   399 0.00   0   3   0   1   0   0   574 1.00
[verbose  ] profile: ---------------------------------------------------------------------------------------------------
[verbose  ] profile: Skidding   +0      61.4012             0.00 0.00   0  1682 0.00   0  13   0   8   0   0  1613 6.00

OpenCL based applications unfortunately are likely rather difficult to run with this card, at least from my experience. It worked fine in my desktop, but I only got it to work with the proprietary OpenCL driver (and I think ROCm dropped support for polaris?)

There are some more instructions left that I know cause issues with some unity games, but a lot of things should work again now.

Coreforge avatar Dec 01 '23 23:12 Coreforge

@Coreforge Awesome, thanks! Could you also try running it windowed inside the wayfire-pi environment too? (If you get a chance).

geerlingguy avatar Dec 02 '23 00:12 geerlingguy

I can try, though I had some missing libs last time I tried 3D stuff inside wayfire.

Coreforge avatar Dec 02 '23 00:12 Coreforge

Just for a frame of reference, on my RX 570 on x86_64 desktop I get ~3200 points at the same resolution. So I think you are getting the full performance in that benchmark since the RX 460 is supposed to be a bit less than 1/2 as powerful as the RX 570. Do you have a reference test of that RX 460 on a desktop x86_64 computer?

theofficialgman avatar Dec 02 '23 00:12 theofficialgman

Also just a suggestion, I imagine you have trouble using the raspberry pi patched and built chromium with anything other than the pi4/pi5 videocore gpus. You will probably have better luck with vanilla chromium either from building chromium from source, using the chromium flatpak or snap (which are vanilla chromium without any notable patches), or using my chromium debs that you can find here -> https://github.com/theofficialgman/testing/releases/tag/gmans-releases (latest version chromium-browser-stable_119.0.6045.199-1_arm64.deb) (note I don't suggest using my chromium debs all the time since I have specifically patched out libVPX support to use ffmpeg for vp9 hardware decoding on nvidia tegra systems, I use these debs to repackage chromium for Switchroot Nintendo Switch linux distros).

theofficialgman avatar Dec 02 '23 00:12 theofficialgman

I didn't do benchmarks on x86 with the card, but since it was showing 100% utilization in most tests, that should be the full performance in those. I haven't run any browsers so far, but I'll keep that in mind if I do run into any issues.

Coreforge avatar Dec 02 '23 04:12 Coreforge

Fixing SIMD instructions seems to be more complicated, as the SIMD registers don't get saved as part of the pt_regs struct. I'm currently using fpsimd_save_state() to save them to a temporary location and just assume that nothing in between the exception and that part in the handler changes those registers (which should be the case), but even though unity games work somewhat now, they segfault sometimes, and not always at the same point. The issues always occur in the UnityGfxDeviceW thread, which isn't a big surprise, but I'm not exactly sure what's causing them, I need to do more debugging on that. It does look like I'm not properly fixing some instructions though, as there are graphical glitches in the menu of Getting Over It, and it's even worse in-game. I have it narrowed down to a 128bit unsigned immediate str, but other than maybe not getting the correct data to write to memory (if fpsimd_save_state() doesn't work properly for this), I'm not really sure what's causing these problems.

Coreforge avatar Dec 07 '23 21:12 Coreforge

What I was doing before was definitely not working. I changed to read the vector registers from current->thread.uw.fpsimd_state.vregs, and call kernel_neon_begin() before reading the data to ensure it's saved (and then kernel_neon_end() afterwards), and it's certainly looking better (at least the main menu of Getting Over It is), but Getting Over It still crashes in the same place. Looking around with GDB, I couldn't find much obvious either. It is related to SIMD instruction fixes, as I saw a null pointer dereference when I changed it to just write 0 for all SIMD fixups, but I'm not sure where exactly I'm doing something wrong. I pushed my current code again though.

I tried a few more games that I could run as well: DOOM 2016 with OpenGL can at least get to the menu (I haven't tried further), but the performance isn't too great. (CPU bound, might also be affected by the logging going on). Both DOOM 2016 and DOOM Eternal don't launch with Vulkan, they complain about being unable to initialize vulkan (although I know both work with this card, and vulkan works fine on the pi too). The Talos Principle runs, although performance isn't too great here either. It's also CPU bound, and it wasn't spamming dmesg (though it could still be triggering the alignment trap a lot). This game uses Vulkan.

I also did two runs of GravityMark, one on vulkan and one on opengl. The vulkan one ran and looked just fine, the opengl one was triggering the alignment trap a lot again with SIMD instructions, and the earth was flickering or not there at all for parts of the run.

I saw that there were (likely) two instructions causing problems:

[ 6964.423429] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68438038
[ 6964.457330] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6842529c
[ 6964.491163] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f68401f9c
[ 6964.491173] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68401fb8
[ 6964.526794] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f6840cc1c
[ 6964.526804] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6840cc38
[ 6964.560762] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f6841981c
[ 6964.560773] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68419838
[ 6964.594807] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6842869c
[ 6964.628703] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f684353f4
[ 6964.664459] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f68411f9c

I've been mostly focused on the 128bit stuff for now (as one of the instructions I saw causing a lot of issues was a 128bit str), so I suspect the 64bit one might not get handled properly.

I also gave monado a try (I wasn't expecting anything too special, as it mainly just needs vulkan, which works fine), and it works fine, though I haven't been able to get anything other than hello_xr to run.

Coreforge avatar Dec 08 '23 21:12 Coreforge

Apparently I forgot to read the second data register for SIMD instructions, so instructions stp qx, qx, [xx] only stored the first register, and not the second one. I also added some code to keep track of which instructions have already been handled at some point and which haven't, to make it easier to find potentially bad ones.

I added a bunch of instruction tests to my gpu memory access tests (all instructions I've seen get handled when launching Getting Over It or The Long Dark), which helped me find the stp issue. All instructions being tested are now being handled correctly. However, Unity games are still segfaulting, and I don't know why. I might try rendering them with llvmpipe instead to make sure it's actually related to the GPU, although I think it is.

Coreforge avatar Jan 12 '24 23:01 Coreforge