cudnn-frontend icon indicating copy to clipboard operation
cudnn-frontend copied to clipboard

Bug in Flash with rng dropout sample

Open wfoy opened this issue 9 months ago • 0 comments

I believe I've found a bug within the Flash with rng dropout sample test

if (seq_len_override) {
        Surface<int32_t> devActualSeqlenQ(b, false);
        Surface<int32_t> devActualSeqlenKV(b, false);
        std::vector<int32_t> hostActualSeqlenQ(b, 20);
        std::vector<int32_t> hostActualSeqlenKV(b, 20);
        
        checkCudaErr(cudaMemcpy(devActualSeqlenQ.devPtr,
                                hostActualSeqlenQ.data(),
                                sizeof(hostActualSeqlenQ[0]) * b,
                                cudaMemcpyHostToDevice));
        checkCudaErr(cudaMemcpy(devActualSeqlenKV.devPtr,
                                hostActualSeqlenKV.data(),
                                sizeof(hostActualSeqlenKV[0]) * b,
                                cudaMemcpyHostToDevice));
        checkCudaErr(cudaDeviceSynchronize());
        
        variant_pack[seq_q]  = devActualSeqlenQ.devPtr;
        variant_pack[seq_kv] = devActualSeqlenKV.devPtr;
}
    
Surface<float> statsTensor(b * h * s_q * 1, false);
if (is_inference == false) {
        variant_pack[stats] = statsTensor.devPtr;
}

When the devActualSeqlenQ and devActualSeqlenKV Surfaces go out of scope at the end of the if statement the destructors are called and both devPtrs are cudaFree'd, even though they are part of the variant pack.

The bug is hidden due to the fact that upon the construction of the statsTensor Surface, the next cudaMalloc call returns the same device address that was previously freed, resulting in the statsTensor.devPtr being the same address as that of either the devActualSeqLenQ or devActualSeqlenKV devPtr. I'm not sure how the test still passes even though the hostActualSeqlen vectors that are cudaMemcpy'd in are subsequently freed on the device.

### Tasks

wfoy avatar May 21 '24 17:05 wfoy