FidelityFX-SDK icon indicating copy to clipboard operation
FidelityFX-SDK copied to clipboard

tempBuffer[0] result is wrong in ComputeSCDHistogramsDivergence for non-AMD graphics card

Open qianyili opened this issue 1 year ago • 0 comments

For NV card, although warp size is 32, NV need to add barrier for a lane to read shared memory written by another lane in same warp, otherwise the final sum value tempBuffer[0] is wrong.

From NV cuda doc: If the compute operation only reads shared memory written to by other threads in the same warp as the current thread, __syncwarp() suffices.

For Intel card, warp size is 16, maybe the result is wrong also. I haven't checked the result yet.

void ComputeSCDHistogramsDivergence() { ... if (iLocalIndex < 64) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 64]; FFX_GROUP_MEMORY_BARRIER;

//below code is abnormal as lack of FFX_GROUP_MEMORY_BARRIER
if (iLocalIndex < 32) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 32];
if (iLocalIndex < 16) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 16];
if (iLocalIndex < 8 ) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 8];
if (iLocalIndex < 4 ) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 4];
if (iLocalIndex < 2 ) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 2];
if (iLocalIndex < 1 ) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 1];
FFX_GROUP_MEMORY_BARRIER;

filteredHistogram[iLocalIndex] /= tempBuffer[0];

...

}

qianyili avatar Nov 06 '24 12:11 qianyili