radeon_gpu_profiler
radeon_gpu_profiler copied to clipboard
[QA] wavefront occupancy is hard to understand
const static std::string mul_kernel = R"(
__kernel void MulAssign(__global uint* in0_out, __global uint* in1,
const uint index_start, const uint index_num,
const uint index_length) {
uint tid = get_global_id(0);
uint tnum = get_global_size(0);
lu_t in0_out_tmp; // uint[12]
lu_t in1_tmp; // uint[12]
for (uint i = tid + index_start; i < index_start + index_num; i += tnum) {
for (uint j = 0, k = i; j < N; ++j, k+=index_length) {
in0_out_tmp[j] = in0_out[k];
in1_tmp[j] = in1[k];
}
for(uint j=0; j<COUNT; ++j){
MU_FUNC(MulAssign)(in0_out_tmp, in1_tmp);// many compute not access memory
MU_FUNC(MulAssign)(in1_tmp, in0_out_tmp);
}
for (uint j = 0, k=i; j < N; ++j,k+=index_length) {
in0_out[k] = in0_out_tmp[j];
}
}
}
)";
when i set different value for COUNT, the wavefront occupancy, why?
for COUNT=1024
for COUNT=32
@chesik-amd can you help me?
I can't really say why the occupancy changes based solely on the screenshots. If you think the occupancy isn't accurate, please provide the .rgp files with the problem and we can take a look to see if there is something going wrong.
@chesik-amd ok but i don't know how to give you my profile file
I noticed a very large time gap here. Can you explain it
@chesik-amd