hiperc
hiperc copied to clipboard
unroll convolution
Inner loops on CUDA convolution code should run faster using a #pragma unroll
statement.
-
#pragma unroll N
in CUDA -
#pragma unroll N
in OpenCL - Unavailable for OpenAcc, but try
-Munroll
flag with pgcc -
#pragma unroll N
for icc