XRT icon indicating copy to clipboard operation
XRT copied to clipboard

DMA & mem-bw validation failed with Alveo U280

Open xsun2001 opened this issue 2 years ago • 4 comments

Our brand new alveo u280 card failed to pass the validation. Detailed information & outputs are listed below. I have noticed that #6104 #6105 are similar to my issue. However I cannot find a solution for it. I will try to disable IOMMU and report the situation later.

  • uname -a:
$ uname -a
Linux xavier 5.10.0-15-amd64 #1 SMP Debian 5.10.120-1 (2022-06-09) x86_64 GNU/Linux
$ sudo /opt/xilinx/xrt/bin/xbutil validate -d 0000:81:00.1 -r dma --verbose
Verbose: Enabling Verbosity
Starting validation for 1 devices

Validate Device           : [0000:81:00.1]
    Platform              : xilinx_u280_xdma_201920_3
    SC Version            : 4.3.15
    Platform ID           : 0x5e278820
-------------------------------------------------------------------------------
Test 1 [0000:81:00.1]     : dma 
    Description           : Run dma test
    Details               : Buffer size - '16 MB'
    Error(s)              : DMA failed: Input/output error
    Test Status           : [FAILED]
-------------------------------------------------------------------------------
Validation failed
  • mem-bw failure:
$ sudo /opt/xilinx/xrt/bin/xbutil validate -d 0000:81:00.1 -r mem-bw --verbose
Verbose: Enabling Verbosity
Starting validation for 1 devices

Validate Device           : [0000:81:00.1]
    Platform              : xilinx_u280_xdma_201920_3
    SC Version            : 4.3.15
    Platform ID           : 0x5e278820
-------------------------------------------------------------------------------
Test 1 [0000:81:00.1]     : mem-bw 
    Description           : Run 'bandwidth kernel' and check the throughput
    Xclbin                : /opt/xilinx/xsa/xilinx_u280_xdma_201920_3/test
    Testcase              : /opt/xilinx/xrt/test/23_bandwidth.py
    Error(s)              : Host buffer alignment 4096 bytes
                            Compiled kernel =
                            /opt/xilinx/xsa/xilinx_u280_xdma_201920_3/test/bandwidth.xclbin
                            unable to sync BO: Input/output error
                            FAILED TEST
                            
    Test Status           : [FAILED]
-------------------------------------------------------------------------------
Validation failed
  • dmesg
[ 2354.198996] xocl:xdma_xfer_fastpath: Wait for request timed out
[ 2354.199002] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x000000006a917350) = 0x1fc00006 (id).
[ 2354.199629] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x000000008f3791cb) = 0x00000001 (status).
[ 2354.199632] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000b55e7658) = 0x00f83e1f (control)
[ 2354.199635] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x0000000008af2f3e) = 0xfff20000 (first_desc_lo)
[ 2354.199637] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000dfedf861) = 0x00000000 (first_desc_hi)
[ 2354.199640] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000af59ae7d) = 0x0000001f (first_desc_adjacent).
[ 2354.199642] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000530e7f2f) = 0x0000001f (completed_desc_count).
[ 2354.199645] xocl:engine_reg_dump: 0-H2C0-MM: ioread32(0x00000000c5d3cdee) = 0x00f83e1e (interrupt_enable_mask)
[ 2354.199648] xocl:check_nonzero_interrupt_status: 0000:81:00.1 xdma0 user_int_enable = 0x0000000f
[ 2354.199650] xocl:check_nonzero_interrupt_status: 0000:81:00.1 xdma0 channel_int_enable = 0x0000000f
[ 2354.199721] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: DMA failed, Dumping SG Page Table, ep addr 4000000000
[ 2354.200601] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 0, 0x1247a6000
[ 2354.201085] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 1, 0x181c7f000
[ 2354.201568] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 2, 0x1a12e2000
[ 2354.202048] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 3, 0x10ddc3000
[ 2354.202524] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 4, 0x1e3c60000
[ 2354.203008] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 5, 0x1c003e000
[ 2354.203484] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 6, 0x124571000
[ 2354.203905] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 7, 0x1819a3000
[ 2354.204205] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 8, 0x18186a000
[ 2354.204505] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 9, 0x14843b000
[ 2354.204803] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 10, 0x1c434b000
[ 2354.205102] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 11, 0x1c6938000
[ 2354.205398] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 12, 0x10dfb8000
[ 2354.205697] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 13, 0x1c4fb2000
[ 2354.205993] xocl 0000:81:00.1: dma.xdma.u.5242880 ffff8c19e5452c10 xdma_migrate_bo: 14, 0x185e56000
...
  • xbutil examine
$ sudo /opt/xilinx/xrt/bin/xbutil examine 
System Configuration
  OS Name              : Linux
  Release              : 5.10.0-15-amd64
  Version              : #1 SMP Debian 5.10.120-1 (2022-06-09)
  Machine              : x86_64
  CPU Cores            : 32
  Memory               : 112628 MB
  Distribution         : Debian GNU/Linux 11 (bullseye)
  GLIBC                : 2.31
  Model                : Super Server

XRT
  Version              : 2.13.0
  Branch               : 
  Hash                 : 
  Hash Date            : 2022-06-20 13:51:55
  XOCL                 : 2.13.0, 
  XCLMGMT              : 2.13.0, 

Devices present
BDF             :  Shell                      Platform UUID  Device ID       Device Ready*  
[0000:81:00.1]  :  xilinx_u280_xdma_201920_3  0x5e278820     user(inst=128)  Yes            

* Devices that are not ready will have reduced functionality when using XRT tools

xsun2001 avatar Jun 20 '22 08:06 xsun2001

I have noticed on my U50 that I cannot pass the test with more than 4KiB. Can you try with adding --run dma --param dma:block-size:4096?

keryell avatar Jun 21 '22 01:06 keryell

I have noticed on my U50 that I cannot pass the test with more than 4KiB. Can you try with adding --run dma --param dma:block-size:4096?

Description : Run dma test Details : Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2977.6 MB/s Host <- PCIe <- FPGA read bandwidth = 2344.4 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2963.6 MB/s Host <- PCIe <- FPGA read bandwidth = 2752.6 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2974.9 MB/s Host <- PCIe <- FPGA read bandwidth = 2459.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2989.0 MB/s Host <- PCIe <- FPGA read bandwidth = 3114.1 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2963.1 MB/s Host <- PCIe <- FPGA read bandwidth = 3227.1 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2979.5 MB/s Host <- PCIe <- FPGA read bandwidth = 3013.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2963.8 MB/s Host <- PCIe <- FPGA read bandwidth = 3220.1 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2964.3 MB/s Host <- PCIe <- FPGA read bandwidth = 3036.2 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2975.0 MB/s Host <- PCIe <- FPGA read bandwidth = 3098.9 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2976.5 MB/s Host <- PCIe <- FPGA read bandwidth = 2959.2 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2968.3 MB/s Host <- PCIe <- FPGA read bandwidth = 3080.4 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2956.0 MB/s Host <- PCIe <- FPGA read bandwidth = 2833.8 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2958.7 MB/s Host <- PCIe <- FPGA read bandwidth = 3214.0 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2962.7 MB/s Host <- PCIe <- FPGA read bandwidth = 2872.4 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2975.0 MB/s Host <- PCIe <- FPGA read bandwidth = 2017.2 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2996.8 MB/s Host <- PCIe <- FPGA read bandwidth = 3164.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2936.2 MB/s Host <- PCIe <- FPGA read bandwidth = 2769.3 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2961.3 MB/s Host <- PCIe <- FPGA read bandwidth = 2496.0 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2968.3 MB/s Host <- PCIe <- FPGA read bandwidth = 2445.5 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2704.1 MB/s Host <- PCIe <- FPGA read bandwidth = 2168.6 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2890.4 MB/s Host <- PCIe <- FPGA read bandwidth = 2372.5 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2932.8 MB/s Host <- PCIe <- FPGA read bandwidth = 2555.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2961.8 MB/s Host <- PCIe <- FPGA read bandwidth = 3072.6 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2899.6 MB/s Host <- PCIe <- FPGA read bandwidth = 2440.3 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2963.2 MB/s Host <- PCIe <- FPGA read bandwidth = 3221.8 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2901.9 MB/s Host <- PCIe <- FPGA read bandwidth = 2317.8 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2959.2 MB/s Host <- PCIe <- FPGA read bandwidth = 2417.3 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2964.9 MB/s Host <- PCIe <- FPGA read bandwidth = 3211.8 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2950.7 MB/s Host <- PCIe <- FPGA read bandwidth = 3103.7 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2979.0 MB/s Host <- PCIe <- FPGA read bandwidth = 2164.4 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2961.5 MB/s Host <- PCIe <- FPGA read bandwidth = 2386.5 MB/s Buffer size - '40960 Byte' Host -> PCIe -> FPGA write bandwidth = 2826.0 MB/s Host <- PCIe <- FPGA read bandwidth = 2901.6 MB/s Test Status : [PASSED] Validation completed

It seems that the larger than 4096, the transfer speed will be faster.

AchingSoul000 avatar Jun 29 '22 08:06 AchingSoul000

Thanks guys. I can confirm that IOMMU caused this issue. In my opinion, this incompatibility should be noted in getting started guide or other obvious place.

xsun2001 avatar Jun 29 '22 08:06 xsun2001

In my opinion, this incompatibility should be fixed. :-)

keryell avatar Jun 29 '22 12:06 keryell