Test AMD Radeon AI Pro R9700
The AI Pro R9700 has 32GB of VRAM, and was introduced in 2025.
Getting the card working on Pi OS 13 "Trixie"
See: Using AMD GPUs on Raspberry Pi without recompiling Linux.
It should. I may try to find a way to get one of these to test. The RX 9700 XT works fine (see #766), and the R9700 is similar architecture, more VRAM...
Edit: I should be getting one soon ;)
Test setup:
lspci output:
0001:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 [Radeon AI PRO R9700] (rev c0) (prog-if 00 [VGA controller])
Subsystem: XFX Limited Device 9801
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 188
Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at 1810000000 (64-bit, prefetchable) [size=2M]
Region 5: Memory at 1b80000000 (32-bit, non-prefetchable) [size=512K]
Expansion ROM at 1b80080000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [64] Express (v2) Legacy Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ TEE-IO-
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 32GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 32GT/s, Width x16
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5-32GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 32GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 000000fffffff000 Data: 0009
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [200 v1] Physical Resizable BAR
BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
BAR 2: current size: 2MB, supported: 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB
Capabilities: [240 v1] Power Budgeting <?>
Capabilities: [270 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [2d0 v1] Process Address Space ID (PASID)
PASIDCap: Exec+ Priv+, Max PASID Width: 10
PASIDCtl: Enable- Exec- Priv-
Capabilities: [320 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [450 v1] Lane Margining at the Receiver
PortCap: Uses Driver-
PortSta: MargReady+ MargSoftReady-
Capabilities: [500 v1] Physical Layer 32.0 GT/s <?>
Kernel driver in use: amdgpu
Kernel modules: amdgpu
And thanks to Micro Center for helping me on this project — I will be working on some special tests for next month ;)
But I'll continue testing in this issue...
A few benchmarks:
vkmark
WARNING: radv is not a conformant Vulkan implementation, testing use only.
WARNING: radv is not a conformant Vulkan implementation, testing use only.
=======================================================
vkmark 2025.01
=======================================================
Vendor ID: 0x1002
Device ID: 0x7551
Device Name: AMD Radeon Graphics (RADV GFX1201)
Driver Version: 104857607
Device UUID: c284624a6065d2a279721dfe5e3867cf
=======================================================
[vertex] device-local=true: FPS: 17396 FrameTime: 0.057 ms
[vertex] device-local=false: FPS: 757 FrameTime: 1.321 ms
[texture] anisotropy=0: FPS: 17329 FrameTime: 0.058 ms
[texture] anisotropy=16: FPS: 17482 FrameTime: 0.057 ms
[shading] shading=gouraud: FPS: 17510 FrameTime: 0.057 ms
[shading] shading=blinn-phong-inf: FPS: 17402 FrameTime: 0.057 ms
[shading] shading=phong: FPS: 17438 FrameTime: 0.057 ms
[shading] shading=cel: FPS: 17336 FrameTime: 0.058 ms
[effect2d] kernel=edge: FPS: 17602 FrameTime: 0.057 ms
[effect2d] kernel=blur: FPS: 17685 FrameTime: 0.057 ms
[desktop] <default>: FPS: 15897 FrameTime: 0.063 ms
[cube] <default>: FPS: 17477 FrameTime: 0.057 ms
[clear] <default>: FPS: 14055 FrameTime: 0.071 ms
=======================================================
vkmark Score: 15797
=======================================================
glmark2-es2-wayland
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon Graphics (radeonsi, gfx1201, ACO, DRM 3.64, 6.17.5-v8-16k+)
GL_VERSION: OpenGL ES 3.2 Mesa 25.0.7-2+rpt3
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 941 FrameTime: 1.063 ms
[build] use-vbo=true: FPS: 9289 FrameTime: 0.108 ms
[texture] texture-filter=nearest: FPS: 10564 FrameTime: 0.095 ms
[texture] texture-filter=linear: FPS: 9515 FrameTime: 0.105 ms
[texture] texture-filter=mipmap: FPS: 10452 FrameTime: 0.096 ms
[shading] shading=gouraud: FPS: 10930 FrameTime: 0.091 ms
[shading] shading=blinn-phong-inf: FPS: 10482 FrameTime: 0.095 ms
[shading] shading=phong: FPS: 9220 FrameTime: 0.108 ms
[shading] shading=cel: FPS: 9452 FrameTime: 0.106 ms
[bump] bump-render=high-poly: FPS: 10325 FrameTime: 0.097 ms
[bump] bump-render=normals: FPS: 10312 FrameTime: 0.097 ms
[bump] bump-render=height: FPS: 10478 FrameTime: 0.095 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 10439 FrameTime: 0.096 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 10355 FrameTime: 0.097 ms
[pulsar] light=false:quads=5:texture=false: FPS: 9750 FrameTime: 0.103 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 6129 FrameTime: 0.163 ms
[desktop] effect=shadow:windows=4: FPS: 6814 FrameTime: 0.147 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 380 FrameTime: 2.636 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 593 FrameTime: 1.687 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 405 FrameTime: 2.473 ms
[ideas] speed=duration: FPS: 4343 FrameTime: 0.230 ms
[jellyfish] <default>: FPS: 8772 FrameTime: 0.114 ms
[terrain] <default>: FPS: 4226 FrameTime: 0.237 ms
[shadow] <default>: FPS: 8124 FrameTime: 0.123 ms
[refract] <default>: FPS: 7164 FrameTime: 0.140 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 10835 FrameTime: 0.092 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 10703 FrameTime: 0.093 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 10293 FrameTime: 0.097 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 10461 FrameTime: 0.096 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 10697 FrameTime: 0.093 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 10472 FrameTime: 0.095 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 9871 FrameTime: 0.101 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 10506 FrameTime: 0.095 ms
=======================================================
glmark2 Score: 8280
=======================================================
GravityMark
Result: 58,587
And here's a video of it running, sucking down quite the power on this card! (Lots of variations of coil whine through the different scenes):
https://github.com/user-attachments/assets/fd1bfa12-3654-4650-a48f-47ad18c49797
I just noticed, the link status is reporting:
LnkSta: Speed 32GT/s, Width x16
And in nvtop, I see Gen 2 x1...
I have Gen 3 configured in /boot/firmware/config.txt, and the bridge itself is saying 8 GT/sec:
pi@cm5:~ $ sudo lspci -vv | grep -E 'PCI bridge|LnkCap'
0001:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries BCM2712 PCIe Bridge (rev 30) (prog-if 00 [Normal decode])
LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS+
0001:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 24) (prog-if 00 [Normal decode])
LnkCap: Port #0, Speed 32GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
LnkCap2: Supported Link Speeds: 2.5-32GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
Weird.
Running some AI Benchmarks here: https://github.com/geerlingguy/ai-benchmarks/issues/35
Power draw goes between 18.2W at idle (CM5 + card in eGPU dock, measured by wall outlet) to 289.5W at full tilt:
Testing the same card on an Ubuntu 25.10 install on kernel 6.17.x on an Intel Core Ultra 265K platform (see https://github.com/geerlingguy/sbc-reviews/issues/93):
- vkmark: 40285
- glmark2-es2-wayland: 21145
- GravityMark: 64,469
AI results for both cards: https://github.com/geerlingguy/ai-benchmarks/issues/35
The card has been quite stable. Interesting seeing the difference running it in a computer with very little limitation vs. on the Pi 5...
I want to test GPU-accelerated video transcoding (like I did with the 4070 Ti) with this card and h264_amf...
With the defaults using encoder-benchmark, I'm seeing:
Input #0, yuv4mpegpipe, from '720-60.y4m':
Duration: 00:00:30.00, start: 0.000000, bitrate: 663554 kb/s
Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p(tv, progressive), 1280x720, 60 fps, 60 tbr, 60 tbn
[vost#0:0 @ 0x5556232913c0] Unknown encoder 'h264_amf'
[vost#0:0 @ 0x5556232913c0] Error selecting an encoder
Error opening output file -.
Error opening output files: Encoder not found
So checking on which encoders are available, I see:
V....D libx264 libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (codec h264)
V....D libx264rgb libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 RGB (codec h264)
V....D h264_nvenc NVIDIA NVENC H.264 encoder (codec h264)
V..... h264_v4l2m2m V4L2 mem2mem H.264 encoder wrapper (codec h264)
V....D h264_vaapi H.264/AVC (VAAPI) (codec h264)
V....D h264_vulkan H.264/AVC (Vulkan) (codec h264)
Hmm... would maybe be best to test with vaapi as I think that's what the amdgpu driver supports?
I'll try running ffmpeg directly, instead, since VAAPI encoder support may be a future feature:
time ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD129 -i 720-60.y4m -filter_hw_device foo -vf 'format=nv12,hwupload' -c:v h264_vaapi -pix_fmt yuv420p -movflags +faststart 720-60.mp4 && \
time ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD129 -i 1080-60.y4m -filter_hw_device foo -vf 'format=nv12,hwupload' -c:v h264_vaapi -pix_fmt yuv420p -movflags +faststart 1080-60.mp4 && \
time ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD129 -i 4k-60.y4m -filter_hw_device foo -vf 'format=nv12,hwupload' -c:v h264_vaapi -pix_fmt yuv420p -movflags +faststart 4k-60.mp4
| Video file | File size | Time (sec) | Average fps |
|---|---|---|---|
| 720-60.y4m | 2.4G | 7 | 260 |
| 1080-60.y4m | 5.3G | 17 | 109 |
| 4k-60.y4m | 11G | 73 | 27 |
The card was using around 125W during transcoding:
I would expect the numbers to be substantially higher on the Intel PC as well, just like with the 4070 Ti...