cudnn-frontend
cudnn-frontend copied to clipboard
Bug in Flash with rng dropout sample
I believe I've found a bug within the Flash with rng dropout sample test
if (seq_len_override) {
Surface<int32_t> devActualSeqlenQ(b, false);
Surface<int32_t> devActualSeqlenKV(b, false);
std::vector<int32_t> hostActualSeqlenQ(b, 20);
std::vector<int32_t> hostActualSeqlenKV(b, 20);
checkCudaErr(cudaMemcpy(devActualSeqlenQ.devPtr,
hostActualSeqlenQ.data(),
sizeof(hostActualSeqlenQ[0]) * b,
cudaMemcpyHostToDevice));
checkCudaErr(cudaMemcpy(devActualSeqlenKV.devPtr,
hostActualSeqlenKV.data(),
sizeof(hostActualSeqlenKV[0]) * b,
cudaMemcpyHostToDevice));
checkCudaErr(cudaDeviceSynchronize());
variant_pack[seq_q] = devActualSeqlenQ.devPtr;
variant_pack[seq_kv] = devActualSeqlenKV.devPtr;
}
Surface<float> statsTensor(b * h * s_q * 1, false);
if (is_inference == false) {
variant_pack[stats] = statsTensor.devPtr;
}
When the devActualSeqlenQ and devActualSeqlenKV Surfaces go out of scope at the end of the if statement the destructors are called and both devPtrs are cudaFree'd, even though they are part of the variant pack.
The bug is hidden due to the fact that upon the construction of the statsTensor Surface, the next cudaMalloc call returns the same device address that was previously freed, resulting in the statsTensor.devPtr being the same address as that of either the devActualSeqLenQ or devActualSeqlenKV devPtr. I'm not sure how the test still passes even though the hostActualSeqlen vectors that are cudaMemcpy'd in are subsequently freed on the device.
### Tasks