XRT
XRT copied to clipboard
DMA & mem-bw validation failed with Alveo U280
Our brand new alveo u280 card failed to pass the validation. Detailed information & outputs are listed below. I have noticed that #6104 #6105 are similar to my issue. However I cannot find a solution for it. I will try to disable IOMMU and report the situation later.
-
uname -a
:
$ uname -a
Linux xavier 5.10.0-15-amd64 #1 SMP Debian 5.10.120-1 (2022-06-09) x86_64 GNU/Linux
- XRT Version: Official 2022.1 release
- Deployment Target Platform: xilinx-u280-xdma-201920.3-3246211_18.04.deb from Xilinx official website
- DMA failure:
$ sudo /opt/xilinx/xrt/bin/xbutil validate -d 0000:81:00.1 -r dma --verbose
Verbose: Enabling Verbosity
Starting validation for 1 devices
Validate Device : [0000:81:00.1]
Platform : xilinx_u280_xdma_201920_3
SC Version : 4.3.15
Platform ID : 0x5e278820
-------------------------------------------------------------------------------
Test 1 [0000:81:00.1] : dma
Description : Run dma test
Details : Buffer size - '16 MB'
Error(s) : DMA failed: Input/output error
Test Status : [FAILED]
-------------------------------------------------------------------------------
Validation failed
- mem-bw failure:
$ sudo /opt/xilinx/xrt/bin/xbutil validate -d 0000:81:00.1 -r mem-bw --verbose
Verbose: Enabling Verbosity
Starting validation for 1 devices
Validate Device : [0000:81:00.1]
Platform : xilinx_u280_xdma_201920_3
SC Version : 4.3.15
Platform ID : 0x5e278820
-------------------------------------------------------------------------------
Test 1 [0000:81:00.1] : mem-bw
Description : Run 'bandwidth kernel' and check the throughput
Xclbin : /opt/xilinx/xsa/xilinx_u280_xdma_201920_3/test
Testcase : /opt/xilinx/xrt/test/23_bandwidth.py
Error(s) : Host buffer alignment 4096 bytes
Compiled kernel =
/opt/xilinx/xsa/xilinx_u280_xdma_201920_3/test/bandwidth.xclbin
unable to sync BO: Input/output error
FAILED TEST
Test Status : [FAILED]
-------------------------------------------------------------------------------
Validation failed
-
dmesg
[ 2354.198996] xocl:xdma_xfer_fastpath: Wait for request timed out
[ 2354.199002] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x000000006a917350) = 0x1fc00006 (id).
[ 2354.199629] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x000000008f3791cb) = 0x00000001 (status).
[ 2354.199632] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000b55e7658) = 0x00f83e1f (control)
[ 2354.199635] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x0000000008af2f3e) = 0xfff20000 (first_desc_lo)
[ 2354.199637] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000dfedf861) = 0x00000000 (first_desc_hi)
[ 2354.199640] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000af59ae7d) = 0x0000001f (first_desc_adjacent).
[ 2354.199642] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000530e7f2f) = 0x0000001f (completed_desc_count).
[ 2354.199645] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000c5d3cdee) = 0x00f83e1e (interrupt_enable_mask)
[ 2354.199648] xocl:check_nonzero_interrupt_status: 0000:81:00.1 xdma0 user_int_enable = 0x0000000f
[ 2354.199650] xocl:check_nonzero_interrupt_status: 0000:81:00.1 xdma0 channel_int_enable = 0x0000000f
[ 2354.199721] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: DMA failed, Dumping SG Page Table, ep addr 4000000000
[ 2354.200601] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 0, 0x1247a6000
[ 2354.201085] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 1, 0x181c7f000
[ 2354.201568] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 2, 0x1a12e2000
[ 2354.202048] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 3, 0x10ddc3000
[ 2354.202524] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 4, 0x1e3c60000
[ 2354.203008] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 5, 0x1c003e000
[ 2354.203484] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 6, 0x124571000
[ 2354.203905] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 7, 0x1819a3000
[ 2354.204205] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 8, 0x18186a000
[ 2354.204505] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 9, 0x14843b000
[ 2354.204803] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 10, 0x1c434b000
[ 2354.205102] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 11, 0x1c6938000
[ 2354.205398] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 12, 0x10dfb8000
[ 2354.205697] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 13, 0x1c4fb2000
[ 2354.205993] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 14, 0x185e56000
...
- xbutil examine
$ sudo /opt/xilinx/xrt/bin/xbutil examine
System Configuration
OS Name : Linux
Release : 5.10.0-15-amd64
Version : #1 SMP Debian 5.10.120-1 (2022-06-09)
Machine : x86_64
CPU Cores : 32
Memory : 112628 MB
Distribution : Debian GNU/Linux 11 (bullseye)
GLIBC : 2.31
Model : Super Server
XRT
Version : 2.13.0
Branch :
Hash :
Hash Date : 2022-06-20 13:51:55
XOCL : 2.13.0,
XCLMGMT : 2.13.0,
Devices present
BDF : Shell Platform UUID Device ID Device Ready*
[0000:81:00.1] : xilinx_u280_xdma_201920_3 0x5e278820 user(inst=128) Yes
* Devices that are not ready will have reduced functionality when using XRT tools
I have noticed on my U50 that I cannot pass the test with more than 4KiB. Can you try with adding --run dma --param dma:block-size:4096
?
I have noticed on my U50 that I cannot pass the test with more than 4KiB. Can you try with adding
--run dma --param dma:block-size:4096
?
Description : Run dma test Details : Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2977.6 MB/s Host <- PCIe <- FPGA read bandwidth = 2344.4 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2963.6 MB/s Host <- PCIe <- FPGA read bandwidth = 2752.6 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2974.9 MB/s Host <- PCIe <- FPGA read bandwidth = 2459.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2989.0 MB/s Host <- PCIe <- FPGA read bandwidth = 3114.1 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2963.1 MB/s Host <- PCIe <- FPGA read bandwidth = 3227.1 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2979.5 MB/s Host <- PCIe <- FPGA read bandwidth = 3013.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2963.8 MB/s Host <- PCIe <- FPGA read bandwidth = 3220.1 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2964.3 MB/s Host <- PCIe <- FPGA read bandwidth = 3036.2 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2975.0 MB/s Host <- PCIe <- FPGA read bandwidth = 3098.9 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2976.5 MB/s Host <- PCIe <- FPGA read bandwidth = 2959.2 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2968.3 MB/s Host <- PCIe <- FPGA read bandwidth = 3080.4 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2956.0 MB/s Host <- PCIe <- FPGA read bandwidth = 2833.8 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2958.7 MB/s Host <- PCIe <- FPGA read bandwidth = 3214.0 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2962.7 MB/s Host <- PCIe <- FPGA read bandwidth = 2872.4 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2975.0 MB/s Host <- PCIe <- FPGA read bandwidth = 2017.2 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2996.8 MB/s Host <- PCIe <- FPGA read bandwidth = 3164.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2936.2 MB/s Host <- PCIe <- FPGA read bandwidth = 2769.3 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2961.3 MB/s Host <- PCIe <- FPGA read bandwidth = 2496.0 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2968.3 MB/s Host <- PCIe <- FPGA read bandwidth = 2445.5 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2704.1 MB/s Host <- PCIe <- FPGA read bandwidth = 2168.6 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2890.4 MB/s Host <- PCIe <- FPGA read bandwidth = 2372.5 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2932.8 MB/s Host <- PCIe <- FPGA read bandwidth = 2555.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2961.8 MB/s Host <- PCIe <- FPGA read bandwidth = 3072.6 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2899.6 MB/s Host <- PCIe <- FPGA read bandwidth = 2440.3 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2963.2 MB/s Host <- PCIe <- FPGA read bandwidth = 3221.8 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2901.9 MB/s Host <- PCIe <- FPGA read bandwidth = 2317.8 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2959.2 MB/s Host <- PCIe <- FPGA read bandwidth = 2417.3 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2964.9 MB/s Host <- PCIe <- FPGA read bandwidth = 3211.8 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2950.7 MB/s Host <- PCIe <- FPGA read bandwidth = 3103.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2979.0 MB/s Host <- PCIe <- FPGA read bandwidth = 2164.4 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2961.5 MB/s Host <- PCIe <- FPGA read bandwidth = 2386.5 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2826.0 MB/s Host <- PCIe <- FPGA read bandwidth = 2901.6 MB/s Test Status : [PASSED] Validation completed
It seems that the larger than 4096, the transfer speed will be faster.
Thanks guys. I can confirm that IOMMU caused this issue. In my opinion, this incompatibility should be noted in getting started guide or other obvious place.
In my opinion, this incompatibility should be fixed. :-)