XRT
XRT copied to clipboard
Pipeline execution for HLS design with dataflow pragma
Hi,
I'm trying to implement the pipelined execution model described in XRT document. Conceptually, I have the following HLS design:
host HLS design with dataflow pragma host
---------------------------------
input_array => | stage_1 -> stage_2 -> stage_3 | => output_array
---------------------------------
-> means streaming channels (AXI-S), while => means external memory access (AXI-MM). I want to implement the following execution schedule in XRT:
|host2device[0]| stage_1 | stage_2 | stage_3 |device2host[0]|
|host2device[1]| stage_1 | stage_2 | stage_3 |device2host[1]|
|host2device[2]| stage_1 | stage_2 | stage_3 |device2host[2]|
... ...
On the HLS side, I have set the following configurations:
syn.interface.s_axilite_mailbox=both
syn.interface.s_axilite_auto_restart_counter=1
syn.interface.s_axilite_sw_reset=true
I tried to directly use hls_run.start(xrt::autostart{2}) (run the two iterations without updating the input array and fetching the output array). But the execution time is simply 2x of hls_run.start(xrt::autostart{1}). However, if pipelined execution is enabled, I expected a much shorter execution time through overlapping - is this the case?
Could you guide me how to enable the pipelined execution? E.g., did I miss anything on the HLS side or the XRT side?
Best, Hanchen