FidelityFX-SDK
FidelityFX-SDK copied to clipboard
tempBuffer[0] result is wrong in ComputeSCDHistogramsDivergence for non-AMD graphics card
For NV card, although warp size is 32, NV need to add barrier for a lane to read shared memory written by another lane in same warp, otherwise the final sum value tempBuffer[0] is wrong.
From NV cuda doc: If the compute operation only reads shared memory written to by other threads in the same warp as the current thread, __syncwarp() suffices.
For Intel card, warp size is 16, maybe the result is wrong also. I haven't checked the result yet.
void ComputeSCDHistogramsDivergence() { ... if (iLocalIndex < 64) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 64]; FFX_GROUP_MEMORY_BARRIER;
//below code is abnormal as lack of FFX_GROUP_MEMORY_BARRIER
if (iLocalIndex < 32) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 32];
if (iLocalIndex < 16) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 16];
if (iLocalIndex < 8 ) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 8];
if (iLocalIndex < 4 ) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 4];
if (iLocalIndex < 2 ) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 2];
if (iLocalIndex < 1 ) tempBuffer[iLocalIndex] += tempBuffer[iLocalIndex + 1];
FFX_GROUP_MEMORY_BARRIER;
filteredHistogram[iLocalIndex] /= tempBuffer[0];
...
}