dct_cuda
dct_cuda copied to clipboard
Some ideas of improvement on dct and idct
-
multiplying scale in
precomputeExpk -
zero paddings to avoid branch divergence
-
in-place or out-of-place cufft, especially in idct
-
number of threads in idct
M/2 * N/2orM/2 * (N/2+1) -
other improvements based on profiling