dma_ip_drivers icon indicating copy to clipboard operation
dma_ip_drivers copied to clipboard

XDMA: transmission time (both H2C and C2H) is not stable

Open vutang opened this issue 4 years ago • 13 comments

Hi all, I have a problem with transmission time in H2C and C2H. For example, when transferring 180KB data from H2C, XDMA takes about 80us. But when I run test in long time, this transmission time sometime increases to about 200us. This unstable state can make our design running fail.

Does anyone have idea for solving this issue?

vutang avatar Jan 29 '21 04:01 vutang

I found the same problem. Did you have any solutions?

Hi all, I have a problem with transmission time in H2C and C2H. For example, when transferring 180KB data from H2C, XDMA takes about 80us. But when I run test in long time, this transmission time sometime increases to about 200us. This unstable state can make our design running fail.

Does anyone have idea for solving this issue?

xueweiwujxw avatar Mar 25 '22 12:03 xueweiwujxw

the same problem too。 It transport data most time about 5ms in C2H , but sometime it over 15ms , make my program failed .

pigstimemachine avatar Apr 03 '25 05:04 pigstimemachine

the same problem too。 It transport data most time about 5ms in C2H , but sometime it over 15ms , make my program failed .

I think transmission time is unstable because the schecduler of threads is not stable especially in not real-time OS. I used DDR in PL to decrease the influence of unstable transmission time. PL -> ddr -> xdma c2h xdma h2c -> ddr -> PL I just need to ensure that the average rates of dma channels is faster than rates of PL input and output so that DDR can offset the jitter. I transmit continuous data in my project. The rates can be up to 6.4 Gbps with DDR(8 G bits) and xdma dma ip drives(ver 2020.2)

xueweiwujxw avatar Apr 03 '25 06:04 xueweiwujxw

@vutang @xueweiwujxw Such variation could also be caused by kmalloc memory allocation inside driver with standard GFP_KERNEL flag. When it can't find free kernel memory immediately, it runs memory reclamation and optimisation routines. There is nothing can be done about it from user space.

Prandr avatar Apr 09 '25 16:04 Prandr

@Prandr so, There is no any idea to resolve the unstable transmission time ?

pigstimemachine avatar Apr 10 '25 10:04 pigstimemachine

@zhaochengjie0405 solution #333 and explanation #332

dmitrym1 avatar Apr 24 '25 17:04 dmitrym1

@zhaochengjie0405 solution #333 and explanation #332

I will test this change, and reply after result done

pigstimemachine avatar Apr 25 '25 03:04 pigstimemachine

@dmitrym1

Hi, thanks for your work on XDMA driver. I used your patch on my design that implement Fifo AXIS C2H transfer by using Xilinx DMA/Bridge Subsystem for PCIe ip. Your patch help me to reduce the frequency of high latency spikes between two consecutive Reads, but the same haven't help me to completely solve the problem. During my test (millions of Reads), I measured an average latency of about 10 uS (thanks to your work) between Reads, but sometimes high latency spikes arise up to 400 uS !! This spikes seems to be totally random and asynchronous among them.. Have you ever experimented the same issue ?

MauSilvestrini avatar Oct 23 '25 14:10 MauSilvestrini

@MauSilvestrini Do you mean latency between reads (that is between read calls) or through the calls themselves? How large are your reads?

Anyway I'd like to suggest to try my reworked version of the driver. It aims to reduce the overhead and thus latency of the DMA transfers, but please read the docs first. No reason waste time and effort on the garbage here.

Prandr avatar Oct 23 '25 15:10 Prandr

Hi @Prandr many thanks for your help. As I said before, during my tests sometimes high latency spikes btw two consecutive read calls occours, also please consider that I am using AXI-STREAM C2H transfer mode. Here a snippet of my application code, with polling_mode = 15 enabled :

while(1) { read(file_fd, (char *)buffer, size_rd); usleep(1); }

I run my host application with the following settings : write_size = 4 KB and read_size = 64 KB. I also set CPU affinity, disabled IRQ balance and set max priority to my application code. But still sometimes high latenncy spikes ( > 500 us) arise.

MauSilvestrini avatar Oct 24 '25 07:10 MauSilvestrini

@MauSilvestrini I see. So if the latency is between reads, then the problem is in what lies between reads: usleep. When you call it OS puts your thread into inactive state and allows other threads to run in its place. And are then completely at OS's mercy, when it gets back to you. If there is a higher priority thread that has to finish its task - bad luck for you. Thus, it is not a reliable method to for a fixed waiting time, but rather should be taken for a minimum waiting time.

You may try to replace usleep with busy waiting:

while(1) {
	volatile unsigned int waiting=0;
	read(file_fd, (char *)buffer, size_rd);
	while(waiting<10000) waiting++;
}

The major drawback is that is not portable and you need to use trial and error method to find suitable value for waiting. Also on modern CPUs that vary clock frequency with load and power, it may fail to achieve uniform waiting time. If you badly need stable waiting times, an RTOS or Linux with PREEMPT_RT kernel are your only bet. See https://bootlin.com/doc/training/preempt-rt/preempt-rt-slides.pdf

I still recommend reworked version of the driver, as it completely avoids core migrations and contex switches in polling mode, thus making all the @dmitrym1's trickery unnecessary. See docs on the repo how to enable it.

Prandr avatar Oct 24 '25 10:10 Prandr

Again, I ve to thank you for all your suggestions @Prandr.
However, I actually forgot to provide you further details :

  1. I ve chosen to insert the usleep delay into my read loop, in order to make the read loop in polling mode less aggressive, however with or without the usleep latency problem remains unchanged;
  2. I ve already pinned both the XDMA driver poll thread and the SW application to the same CPU as suggested by @dmitrym1 's patch;
  3. I ve also set max priority (SCHED_FIFO = 99) to SW application;
  4. I m already using RHEL RT kernel;

MauSilvestrini avatar Oct 24 '25 12:10 MauSilvestrini

Well, there are other processes that things may cause delay in the driver in the driver like above mentioned memory reclamation and compaction routines. The driver is fundamentally not RT-capable. Unfortunately I can't help you any more with this driver.

Prandr avatar Oct 24 '25 13:10 Prandr