Vitis-Tutorials
Vitis-Tutorials copied to clipboard
HwAcc/D/08-alveo_aurora_kernel: bit errors
Hi, I am trying to using Aurora IP with a U250 card following Using Aurora IP in Alveo with Vitis Flow. I could run the example but there are two problems exist,
- Sometimes bit errors are found without any aurora error.
------------------------ krnl_aurora loopback test ------------------------
Transfer size: 1000 KB
Generate TX data block.
Program running in hardware mode
Load krnl_aurora_test_hw.xclbin
Create kernels
Create TX and RX device buffer
Transfer TX data into device buffer
Check whether startup status of Aurora kernel is ready...
Aurora kernel startup status is GOOD: 1000111111111
[12]channel_up [11]soft_err [10]hard_err [9]mmcm_not_locked_out [8]gt_pll_lock [7:4]line_up [3:0]gt_powergood
Begin data loopback transfer
run_strm_dump.start
run_strm_issue.start
run_strm_issue.wait
run_strm_dump.wait
Data loopback transfer finish
Transfer time = 0.316 ms
Fetch RX data from device buffer and verification
Data loopback transfer throughput = 25.3165 Gbps
Aurora Error Status:
SOFT_ERR: 0
HARD_ERR: 0
ref_data[237254] = 39, out_data[237254] = 19
ref_data[237283] = 60, out_data[237283] = 20
ref_data[237285] = 57, out_data[237285] = 5f
Data verification FAIL
Total mismatched bytes: 3
Please check tx_data.dat and rx_data.dat files
- Sometimes bytes lossing are found and the host code stuck at
run_strm_dump.wait()
since the dump_krnl connot receive the expected number of bytes. This happens very frequently once the transfer size is large enough, e.g., >10MB,
------------------------ krnl_aurora loopback test ------------------------
Transfer size: 10000 KB
Generate TX data block.
Program running in hardware mode
Load krnl_aurora_test_hw.xclbin
Create kernels
Create TX and RX device buffer
Transfer TX data into device buffer
Check whether startup status of Aurora kernel is ready...
Aurora kernel startup status is GOOD: 1000111111111
[12]channel_up [11]soft_err [10]hard_err [9]mmcm_not_locked_out [8]gt_pll_lock [7:4]line_up [3:0]gt_powergood
Begin data loopback transfer
run_strm_dump.start
run_strm_issue.start
run_strm_issue.wait
What is the reason of these two issues? Are they related to the FIFO or AURORA ip? Thanks in advance.
It looks like these errors are related to the hardware, or Aurora IP configuration, but not Vitis design flow. @Rampagee Please confirm.
Sorry for late response. Hi, @zhuofanzhang, are you using the 10Gbps or 25Gbps lane rate? If it is the latter case (25Gbps), first you will need a QSFP28 (25G) loopback module, QSFP+ (10G) module may work, but not stable for 25Gbps rate. Second, the design needs some modification as the README mentioned (modified source code not provided in the repo), since the needed user clock has exceeded the default 300MHz platform clock. Change the platform clock is a bit complicated, so I suggest expanding the AXI stream port width to 512bit. Thanks.
Hi, @Rampagee. As for the above issues, I am using the 10Gbps lane rate but with a QSFP28 optical transceiver. I connect the TX port to RX port by using single mode fiber.
I also tried 25 Gbps lane rate following README. It is weird that bit error and bytes lossing do not happen in this scanario.
Hi, @zhuofanzhang, are you connecting two QSFP28 ports in a card (one as TX and another as RX), or connecting two cards? I will try to reproduce your scenario to have a try ...
Hi, @Rampagee, I use only one QSFP28 port. A single-mode fiber is used to connect TX and RX of the same QSFP28 port. I think it is the same as using a loopback module.
Hi, @zhuofanzhang, the bit error might be brought by a few factors, could you please use a loopback module first to exclude the logic design issues? For fiber case, you may need to adjust some advanced setting of the Aurora IP, such as Equalization mode. This issue is not related to Vitis flow, and sorry I am not the expert for Aurora IP, you may post question in Xilinx developer forum regarding the Aurora IP issues. Thanks.
Hi, @zhuofanzhang, the bit error might be brought by a few factors, could you please use a loopback module first to exclude the logic design issues? For fiber case, you may need to adjust some advanced setting of the Aurora IP, such as Equalization mode. This issue is not related to Vitis flow, and sorry I am not the expert for Aurora IP, you may post question in Xilinx developer forum regarding the Aurora IP issues. Thanks.
Sure. I will close the issue. Thanks.
@zhuofanzhang have you solved this issue? We had the same problems, solved 1), are still stuck with 2) and would be really glad for any help or hint