HIP
HIP copied to clipboard
HIP does not `#pragma unroll` loop in some cases
In some cases, HIP appears to be incapable of unrolling a loop, where nvcc
does so without a problem, as is the case in the following example:
#include <hip/hip_runtime.h>
template<int m, int n, int skip>
__global__ void my_kernel(const int* data_src, int* data_dest){
const int mn = m * n;
const int tidx = threadIdx.x;
#pragma unroll
for (int i = tidx; i < mn; i += skip){
data_dest[i] = data_src[i];
}
}
int main(int argc, char** argv){
typedef void (*kernel)(const int*, int*);
static kernel kern_func = my_kernel<23,23,192>;
return 0;
}
Compilation with hipcc
produces this warning:
warning: <unknown>:0:0: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering
while compilation with nvcc
is silent.
After trying different way on compiling it.
- the variable in for-loop condition should be the same among the threads (not sure)
- can not use break inside the for loop hipcc does not complain about the following.
#include <hip/hip_runtime.h>
template<int m, int n, int skip>
__global__ void my_kernel(const int* data_src, int* data_dest){
const int mn = m * n;
const int tidx = threadIdx.x;
#pragma unroll
for (int i = 0; i < mn; i += skip){
if (i + tidx < mn) {
data_dest[i+tidx] = data_src[i+tidx];
}
}
}
int main(int argc, char** argv){
typedef void (*kernel)(const int*, int*);
static kernel kern_func = my_kernel<23,23,192>;
return 0;
}
@shoshijak Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks!
@shoshijak Closing issue. Please re-open if issue still occurs with latest ROCm 6.1.1 Thanks!
@shoshijak does NVCC not complain because it doesn't even attempt to unroll it or it actually unrolls it, have you checked the NVCC temp files to make sure NVCC actually unrolls that loop?
@ppanchad-amd I am seeing this issue in latest ROCM so issue is still there.
@shoshijak @doru1004 Internal ticket has been created to fix this issue. Thanks!
Hi @shoshijak @doru1004, thanks for identifying this issue. HIP currently supports unrolling loops with bounds that are defined at compile-time; see https://rocm.docs.amd.com/projects/HIP/en/docs-6.0.0/reference/kernel_language.html#pragma-unroll. In this case, mn is defined at run-time, so hipcc is unable to unroll the loop. You can address this issue by providing the number of loop iterations (which you will have to calculate beforehand based on your problem size) to unroll explicitly.
I'm closing this issue, but feel free to reopen if this does not address your problem.
Thank you for the answer @schung-amd The problem as I see it is actually not mn because mn is known at compile time because m and n are both known at compile time as they are constants passed to a template. The real problem is tidx. tidx is a runtime value as it is the thread ID. As it is the lower bound of that loop then the loop doesn't get unrolled. If you replace tidx with 0 it will unroll but then of course the semantics changes.
Thanks for the clarifying followup @doru1004, you're correct. The issue lies with tidx as you stated, and modifying the logic so that tidx is not involved in the bounds (i.e. https://github.com/ROCm/HIP/issues/1411#issuecomment-688995185) will allow the loop to be unrolled. #pragma unroll using the thread id in the loop bounds as in the original example code is unsupported.