XDMA: transmission time (both H2C and C2H) is not stable
Hi all, I have a problem with transmission time in H2C and C2H. For example, when transferring 180KB data from H2C, XDMA takes about 80us. But when I run test in long time, this transmission time sometime increases to about 200us. This unstable state can make our design running fail.
Does anyone have idea for solving this issue?
I found the same problem. Did you have any solutions?
Hi all, I have a problem with transmission time in H2C and C2H. For example, when transferring 180KB data from H2C, XDMA takes about 80us. But when I run test in long time, this transmission time sometime increases to about 200us. This unstable state can make our design running fail.
Does anyone have idea for solving this issue?
the same problem too。 It transport data most time about 5ms in C2H , but sometime it over 15ms , make my program failed .
the same problem too。 It transport data most time about 5ms in C2H , but sometime it over 15ms , make my program failed .
I think transmission time is unstable because the schecduler of threads is not stable especially in not real-time OS. I used DDR in PL to decrease the influence of unstable transmission time. PL -> ddr -> xdma c2h xdma h2c -> ddr -> PL I just need to ensure that the average rates of dma channels is faster than rates of PL input and output so that DDR can offset the jitter. I transmit continuous data in my project. The rates can be up to 6.4 Gbps with DDR(8 G bits) and xdma dma ip drives(ver 2020.2)
@vutang @xueweiwujxw
Such variation could also be caused by kmalloc memory allocation inside driver with standard GFP_KERNEL flag. When it can't find free kernel memory immediately, it runs memory reclamation and optimisation routines. There is nothing can be done about it from user space.
@Prandr so, There is no any idea to resolve the unstable transmission time ?
@zhaochengjie0405 solution #333 and explanation #332
@zhaochengjie0405 solution #333 and explanation #332
I will test this change, and reply after result done
@dmitrym1
Hi, thanks for your work on XDMA driver. I used your patch on my design that implement Fifo AXIS C2H transfer by using Xilinx DMA/Bridge Subsystem for PCIe ip. Your patch help me to reduce the frequency of high latency spikes between two consecutive Reads, but the same haven't help me to completely solve the problem. During my test (millions of Reads), I measured an average latency of about 10 uS (thanks to your work) between Reads, but sometimes high latency spikes arise up to 400 uS !! This spikes seems to be totally random and asynchronous among them.. Have you ever experimented the same issue ?
@MauSilvestrini
Do you mean latency between reads (that is between read calls) or through the calls themselves? How large are your reads?
Anyway I'd like to suggest to try my reworked version of the driver. It aims to reduce the overhead and thus latency of the DMA transfers, but please read the docs first. No reason waste time and effort on the garbage here.
Hi @Prandr many thanks for your help. As I said before, during my tests sometimes high latency spikes btw two consecutive read calls occours, also please consider that I am using AXI-STREAM C2H transfer mode. Here a snippet of my application code, with polling_mode = 15 enabled :
while(1) { read(file_fd, (char *)buffer, size_rd); usleep(1); }
I run my host application with the following settings : write_size = 4 KB and read_size = 64 KB. I also set CPU affinity, disabled IRQ balance and set max priority to my application code. But still sometimes high latenncy spikes ( > 500 us) arise.
@MauSilvestrini
I see. So if the latency is between reads, then the problem is in what lies between reads: usleep. When you call it OS puts your thread into inactive state and allows other threads to run in its place. And are then completely at OS's mercy, when it gets back to you. If there is a higher priority thread that has to finish its task - bad luck for you. Thus, it is not a reliable method to for a fixed waiting time, but rather should be taken for a minimum waiting time.
You may try to replace usleep with busy waiting:
while(1) {
volatile unsigned int waiting=0;
read(file_fd, (char *)buffer, size_rd);
while(waiting<10000) waiting++;
}
The major drawback is that is not portable and you need to use trial and error method to find suitable value for waiting. Also on modern CPUs that vary clock frequency with load and power, it may fail to achieve uniform waiting time.
If you badly need stable waiting times, an RTOS or Linux with PREEMPT_RT kernel are your only bet. See https://bootlin.com/doc/training/preempt-rt/preempt-rt-slides.pdf
I still recommend reworked version of the driver, as it completely avoids core migrations and contex switches in polling mode, thus making all the @dmitrym1's trickery unnecessary. See docs on the repo how to enable it.
Again, I ve to thank you for all your suggestions @Prandr.
However, I actually forgot to provide you further details :
- I ve chosen to insert the usleep delay into my read loop, in order to make the read loop in polling mode less aggressive, however with or without the usleep latency problem remains unchanged;
- I ve already pinned both the XDMA driver poll thread and the SW application to the same CPU as suggested by @dmitrym1 's patch;
- I ve also set max priority (SCHED_FIFO = 99) to SW application;
- I m already using RHEL RT kernel;
Well, there are other processes that things may cause delay in the driver in the driver like above mentioned memory reclamation and compaction routines. The driver is fundamentally not RT-capable. Unfortunately I can't help you any more with this driver.