raspberry-pi-pcie-devices icon indicating copy to clipboard operation
raspberry-pi-pcie-devices copied to clipboard

Test AMD Radeon AI Pro R9700

Open martincerven opened this issue 1 month ago • 10 comments

The AI Pro R9700 has 32GB of VRAM, and was introduced in 2025.

Image

Getting the card working on Pi OS 13 "Trixie"

See: Using AMD GPUs on Raspberry Pi without recompiling Linux.

martincerven avatar Oct 27 '25 15:10 martincerven

It should. I may try to find a way to get one of these to test. The RX 9700 XT works fine (see #766), and the R9700 is similar architecture, more VRAM...

Edit: I should be getting one soon ;)

geerlingguy avatar Oct 27 '25 15:10 geerlingguy

Test setup:

Image

lspci output:

0001:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 [Radeon AI PRO R9700] (rev c0) (prog-if 00 [VGA controller])
	Subsystem: XFX Limited Device 9801
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 188
	Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at 1810000000 (64-bit, prefetchable) [size=2M]
	Region 5: Memory at 1b80000000 (32-bit, non-prefetchable) [size=512K]
	Expansion ROM at 1b80080000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [64] Express (v2) Legacy Endpoint, IntMsgNum 0
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ TEE-IO-
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 32GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 32GT/s, Width x16
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
			 AtomicOpsCtl: ReqEn-
			 IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
			 10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
		LnkCap2: Supported Link Speeds: 2.5-32GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 32GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 000000fffffff000  Data: 0009
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
			ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
			PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
			ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
			PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
			ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
			PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [200 v1] Physical Resizable BAR
		BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
		BAR 2: current size: 2MB, supported: 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB
	Capabilities: [240 v1] Power Budgeting <?>
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [2d0 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv+, Max PASID Width: 10
		PASIDCtl: Enable- Exec- Priv-
	Capabilities: [320 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [450 v1] Lane Margining at the Receiver
		PortCap: Uses Driver-
		PortSta: MargReady+ MargSoftReady-
	Capabilities: [500 v1] Physical Layer 32.0 GT/s <?>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

And thanks to Micro Center for helping me on this project — I will be working on some special tests for next month ;)

But I'll continue testing in this issue...

geerlingguy avatar Nov 14 '25 15:11 geerlingguy

A few benchmarks:

vkmark

WARNING: radv is not a conformant Vulkan implementation, testing use only.
WARNING: radv is not a conformant Vulkan implementation, testing use only.
=======================================================
    vkmark 2025.01
=======================================================
    Vendor ID:      0x1002
    Device ID:      0x7551
    Device Name:    AMD Radeon Graphics (RADV GFX1201)
    Driver Version: 104857607
    Device UUID:    c284624a6065d2a279721dfe5e3867cf
=======================================================
[vertex] device-local=true: FPS: 17396 FrameTime: 0.057 ms
[vertex] device-local=false: FPS: 757 FrameTime: 1.321 ms
[texture] anisotropy=0: FPS: 17329 FrameTime: 0.058 ms
[texture] anisotropy=16: FPS: 17482 FrameTime: 0.057 ms
[shading] shading=gouraud: FPS: 17510 FrameTime: 0.057 ms
[shading] shading=blinn-phong-inf: FPS: 17402 FrameTime: 0.057 ms
[shading] shading=phong: FPS: 17438 FrameTime: 0.057 ms
[shading] shading=cel: FPS: 17336 FrameTime: 0.058 ms
[effect2d] kernel=edge: FPS: 17602 FrameTime: 0.057 ms
[effect2d] kernel=blur: FPS: 17685 FrameTime: 0.057 ms
[desktop] <default>: FPS: 15897 FrameTime: 0.063 ms
[cube] <default>: FPS: 17477 FrameTime: 0.057 ms
[clear] <default>: FPS: 14055 FrameTime: 0.071 ms
=======================================================
                                   vkmark Score: 15797
=======================================================

glmark2-es2-wayland

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon Graphics (radeonsi, gfx1201, ACO, DRM 3.64, 6.17.5-v8-16k+)
    GL_VERSION:     OpenGL ES 3.2 Mesa 25.0.7-2+rpt3
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 941 FrameTime: 1.063 ms
[build] use-vbo=true: FPS: 9289 FrameTime: 0.108 ms
[texture] texture-filter=nearest: FPS: 10564 FrameTime: 0.095 ms
[texture] texture-filter=linear: FPS: 9515 FrameTime: 0.105 ms
[texture] texture-filter=mipmap: FPS: 10452 FrameTime: 0.096 ms
[shading] shading=gouraud: FPS: 10930 FrameTime: 0.091 ms
[shading] shading=blinn-phong-inf: FPS: 10482 FrameTime: 0.095 ms
[shading] shading=phong: FPS: 9220 FrameTime: 0.108 ms
[shading] shading=cel: FPS: 9452 FrameTime: 0.106 ms
[bump] bump-render=high-poly: FPS: 10325 FrameTime: 0.097 ms
[bump] bump-render=normals: FPS: 10312 FrameTime: 0.097 ms
[bump] bump-render=height: FPS: 10478 FrameTime: 0.095 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 10439 FrameTime: 0.096 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 10355 FrameTime: 0.097 ms
[pulsar] light=false:quads=5:texture=false: FPS: 9750 FrameTime: 0.103 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 6129 FrameTime: 0.163 ms
[desktop] effect=shadow:windows=4: FPS: 6814 FrameTime: 0.147 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 380 FrameTime: 2.636 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 593 FrameTime: 1.687 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 405 FrameTime: 2.473 ms
[ideas] speed=duration: FPS: 4343 FrameTime: 0.230 ms
[jellyfish] <default>: FPS: 8772 FrameTime: 0.114 ms
[terrain] <default>: FPS: 4226 FrameTime: 0.237 ms
[shadow] <default>: FPS: 8124 FrameTime: 0.123 ms
[refract] <default>: FPS: 7164 FrameTime: 0.140 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 10835 FrameTime: 0.092 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 10703 FrameTime: 0.093 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 10293 FrameTime: 0.097 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 10461 FrameTime: 0.096 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 10697 FrameTime: 0.093 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 10472 FrameTime: 0.095 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 9871 FrameTime: 0.101 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 10506 FrameTime: 0.095 ms
=======================================================
                                  glmark2 Score: 8280 
=======================================================

GravityMark

Result: 58,587

Image

And here's a video of it running, sucking down quite the power on this card! (Lots of variations of coil whine through the different scenes):

https://github.com/user-attachments/assets/fd1bfa12-3654-4650-a48f-47ad18c49797

geerlingguy avatar Nov 14 '25 15:11 geerlingguy

I just noticed, the link status is reporting:

LnkSta:	Speed 32GT/s, Width x16

And in nvtop, I see Gen 2 x1...

Image

I have Gen 3 configured in /boot/firmware/config.txt, and the bridge itself is saying 8 GT/sec:

pi@cm5:~ $ sudo lspci -vv | grep -E 'PCI bridge|LnkCap'
0001:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries BCM2712 PCIe Bridge (rev 30) (prog-if 00 [Normal decode])
		LnkCap:	Port #0, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS+
0001:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 24) (prog-if 00 [Normal decode])
		LnkCap:	Port #0, Speed 32GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
		LnkCap2: Supported Link Speeds: 2.5-32GT/s, Crosslink- Retimer+ 2Retimers+ DRS-

Weird.

geerlingguy avatar Nov 24 '25 18:11 geerlingguy

Running some AI Benchmarks here: https://github.com/geerlingguy/ai-benchmarks/issues/35

geerlingguy avatar Nov 24 '25 18:11 geerlingguy

Power draw goes between 18.2W at idle (CM5 + card in eGPU dock, measured by wall outlet) to 289.5W at full tilt:

Image

geerlingguy avatar Nov 24 '25 18:11 geerlingguy

Testing the same card on an Ubuntu 25.10 install on kernel 6.17.x on an Intel Core Ultra 265K platform (see https://github.com/geerlingguy/sbc-reviews/issues/93):

  • vkmark: 40285
  • glmark2-es2-wayland: 21145
  • GravityMark: 64,469

AI results for both cards: https://github.com/geerlingguy/ai-benchmarks/issues/35

geerlingguy avatar Nov 25 '25 00:11 geerlingguy

The card has been quite stable. Interesting seeing the difference running it in a computer with very little limitation vs. on the Pi 5...

geerlingguy avatar Nov 25 '25 06:11 geerlingguy

I want to test GPU-accelerated video transcoding (like I did with the 4070 Ti) with this card and h264_amf...

With the defaults using encoder-benchmark, I'm seeing:

Input #0, yuv4mpegpipe, from '720-60.y4m':
  Duration: 00:00:30.00, start: 0.000000, bitrate: 663554 kb/s
  Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p(tv, progressive), 1280x720, 60 fps, 60 tbr, 60 tbn
[vost#0:0 @ 0x5556232913c0] Unknown encoder 'h264_amf'
[vost#0:0 @ 0x5556232913c0] Error selecting an encoder
Error opening output file -.
Error opening output files: Encoder not found

So checking on which encoders are available, I see:

 V....D libx264              libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (codec h264)
 V....D libx264rgb           libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 RGB (codec h264)
 V....D h264_nvenc           NVIDIA NVENC H.264 encoder (codec h264)
 V..... h264_v4l2m2m         V4L2 mem2mem H.264 encoder wrapper (codec h264)
 V....D h264_vaapi           H.264/AVC (VAAPI) (codec h264)
 V....D h264_vulkan          H.264/AVC (Vulkan) (codec h264)

Hmm... would maybe be best to test with vaapi as I think that's what the amdgpu driver supports?

geerlingguy avatar Dec 11 '25 21:12 geerlingguy

I'll try running ffmpeg directly, instead, since VAAPI encoder support may be a future feature:

time ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD129 -i 720-60.y4m -filter_hw_device foo -vf 'format=nv12,hwupload' -c:v h264_vaapi -pix_fmt yuv420p -movflags +faststart 720-60.mp4 && \
time ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD129 -i 1080-60.y4m -filter_hw_device foo -vf 'format=nv12,hwupload' -c:v h264_vaapi -pix_fmt yuv420p -movflags +faststart 1080-60.mp4 && \
time ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD129 -i 4k-60.y4m -filter_hw_device foo -vf 'format=nv12,hwupload' -c:v h264_vaapi -pix_fmt yuv420p -movflags +faststart 4k-60.mp4
Video file File size Time (sec) Average fps
720-60.y4m 2.4G 7 260
1080-60.y4m 5.3G 17 109
4k-60.y4m 11G 73 27

The card was using around 125W during transcoding:

Image

I would expect the numbers to be substantially higher on the Intel PC as well, just like with the 4070 Ti...

geerlingguy avatar Dec 11 '25 21:12 geerlingguy