XRT icon indicating copy to clipboard operation
XRT copied to clipboard

Incorrect behavior of kernel events in 2021.1 on edge

Open doonny opened this issue 2 years ago • 3 comments

I am upgrading from 2020.1 to 2021.1, but I have found inconsistent behavior of the xrt on the zcu102 edge device.

I am testing with parallel kernels as a producer and a consumer mode to move data from DDR and backto DDR as loopback test. But the kernel event needs to wait much longer time on one kernel than the other:

图片 we can see that kernel DataStore needs to wait much longer time to finish than DataLoad.

However the kernel profile section, we can see that both kernel transfers the same amount of data with a similar speed:

图片

Moreover, on the guidance page, it was told that kernel DataStore was not USED ??? image

This is what look like in version vitis 2020.1, two kernels have the same execution time: 图片

Codes are very simple:

` #include <stdio.h> #include "ap_axi_sdata.h" #include "ap_int.h" #include "hls_stream.h"

#define VEC_SIZE 16

typedef ap_uint<VEC_SIZE*32> data_vec;

typedef ap_axiu<VEC_SIZE*32,0,0,0> k2k_data;

extern "C" { void DataLoad( const data_vec *A_in, const data_vec *C_in, const unsigned int data_num, hls::stream<k2k_data> &stream_out_0, hls::stream<k2k_data> &stream_out_1 ) { #pragma HLS INTERFACE m_axi port = A_in offset = slave bundle = gmem0 depth = 32// group-0 #pragma HLS INTERFACE m_axi port = C_in offset = slave bundle = gmem1 depth = 32// group-1 #pragma HLS INTERFACE axis port = stream_out_0 depth = 16 #pragma HLS INTERFACE axis port = stream_out_1 depth = 16

k2k_data tmp1, tmp2;

for(unsigned int i=0; i<data_num; i++){
	tmp1.data = A_in[i];
	tmp2.data = C_in[i];
	//blocking streaming access
	stream_out_0.write(tmp1);
	stream_out_1.write(tmp2);
}

} }

#include "ap_axi_sdata.h" #include "ap_int.h" #include "hls_stream.h"

#define VEC_SIZE 16

typedef ap_uint<VEC_SIZE*32> data_vec;

typedef ap_axiu<VEC_SIZE*32,0,0,0> k2k_data;

extern "C" { void DataStore( data_vec *B_out, //HBM[1] data_vec *D_out, //HBM[3] const unsigned int data_num, hls::stream<k2k_data> &stream_in_0, hls::stream<k2k_data> &stream_in_1 ) { #pragma HLS INTERFACE m_axi port = B_out offset = slave bundle = gmem2 // group-0 #pragma HLS INTERFACE m_axi port = D_out offset = slave bundle = gmem3 // group-1 #pragma HLS INTERFACE axis port = stream_in_0 depth = 16 #pragma HLS INTERFACE axis port = stream_in_1 depth = 16

k2k_data tmp;

for(unsigned int i=0; i<data_num; i++){
	tmp = stream_in_0.read();
	B_out[i] = tmp.data;
	tmp = stream_in_1.read();
	D_out[i] = tmp.data;
}

} }

`

doonny avatar Jan 28 '22 01:01 doonny

@jvillarre Can you help comment on this?

stsoe avatar Feb 11 '22 03:02 stsoe

Any updates on this issue ?

doonny avatar Oct 06 '22 00:10 doonny

@doonny , unfortunately, this has fallen off our radar. But frankly speaking, 21.1 is a too old release for us to investigate anything. So if you can try latest 22.1 release and still see the problem then that would be great

uday610 avatar Oct 19 '22 22:10 uday610