XDMA tests failing with Error 512 on Alveo U200
Hello all,
We recently got a Alveo U200 and are currently stuck on getting the DMA driver to run properly. We have first flashed the firmware via XRT so it is not in the factory GOLDEN image and after making sure that xbutil validate passed all the tests, we are trying to get the XDMA driver loaded. The test machine is running Ubuntu 20.04 and with Vivado 2020.2.
Previously, the XDMA driver was not loading onto the U200 card itself, so I modified the xdma_mod.c file from the dma_ip_drivers repo to add 10ee:5000 and 10ee:5001. This did get the driver to load as seen in the lspci output through both regular modprobe and from load_drivers.sh, but running run_tests.sh shows that all the tests completed with errors. Just in case, I also compiled the driver with debug mode on in case that's helpful for anyone else. The main error seems to be code 512.
me@computer:~$ uname -a
Linux computer 5.15.0-134-generic #145~20.04.1-Ubuntu SMP Mon Feb 17 13:27:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Hi @vineetskumar , try again with community patch #240
Hello @dmitrym1. I already tried with alonbl's and hmaarrfk's forks and they had the exact same error.
@vineetskumar okay, these were worth trying. Error 512 is a timeout. Original XDMA code returns code 512 instead of ETIMEDOUT, so this makes some confusion. According to logs I see that your configuration uses two h2c and two c2h channels and the test app uses them simultaneously. The transfer size and offset look good. But the transfer results in timeout in the driver backend (the part that polls XDMA IP for status). It could be wrong status reading or malfunctioning IRQ. But the log also says that engine is BUSY after 10 seconds since transfer start. So it is rather FPGA problem, the AXI peripheral did not answer to XDMA IP request. So that's where I'd start my debugging. Try something simple first, add System ILA to the AXI bus and check if your AXI peripheral is not in reset state. I'd also recommend to change XDMA config to use only one c2h and one h2c channels for the test.
Please note that I'm not a Xilinx employee and I've never worked with Alveo cards. In my understanding this card should go with a reference firmware that should pass the test right out of the box. So if you are sure that you've read all the instructions and the problem is not something simple like a reset switch or jumper (if it has any), you may want to file a support ticket at Xilinx website or at their community forum. Unfortunately they don't answer here at GitHub.
Anyways, if you are feeling confident to debug this issue by yourself, I'd be glad to help you.
Unfortunately I was also having the same issue using the reference GOLDEN image where the XDMA tests are also failing with the same timeout and SG busy message (when it does have a working BAR for XDMA to write to at all).
I'll try creating a new project with that AXI and ILA though and later reply with how that went.
same problem:/dev/xdma0_h2c_2, write 0x400 @ 0x800 failed -1. write file: Unknown error 512 Linux 12700-ubuntu22 6.8.0-59-generic #61~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 15 17:03:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
@hellocaoziyi Do you use AXI-Stream or -MM interface? What is datapath width? What is the size of transfers you trying to perform?
@dmitrym1 Signals have nothing to do with this problem
Sorry I forgot to close this earlier. Pretty sure the problem was that I was flashing the wrong bitstream. The bitstream I was working with worked fine after using its own fork of the xdma driver I think.