sage icon indicating copy to clipboard operation
sage copied to clipboard

Only the first device result aligns with the host’s computation

Open mengllm opened this issue 5 months ago • 0 comments

@and-ivanov @benrothen

Hi,

Regardless of whether I use generated.bin or extracted.bin, and whether I use checksum_kernel or checksum_kernel_from_data, the device’s checksum result changes with each execution of cuLaunchKernel. Only the first device result aligns with the host’s computation. Is this phenomenon reasonable? If so, how can verification between the device and host be achieved?

add print msg in code: int warmup_repeats = warmup ? 10 : 0; State device_result_temp; for (int iters = 0; iters < warmup_repeats + repeats; iters++){ Time t1 = timer::now();
CUDA_DRV_CHECK(cuLaunchKernel(checksum_kernel,
/* grid size */ gridSize, 1, 1, /* block size */ blockSize, 1, 1,
/* shared mem */ 0, /* stream */ nullptr, args, 0));
CUDA_DRV_CHECK(cuCtxSynchronize()); // wait kernel to stop
Time t2 = timer::now(); if (iters >= warmup_repeats) { runtime += seconds(t2 - t1);
}
CUDA_DRV_CHECK(cuMemcpyDtoH(&device_result_temp, device_state, sizeof(State)));
printf("execution %d: device_result_temp.c: %" PRIx32 " \n", iters, device_result_temp.c);

logs: execution 0: device_result_temp.c: 8442a62d base address 0x7f0d9c000000 Data pointer alignment is good! execution 1: device_result_temp.c: 8387062d base address 0x7f0d9c000000 Data pointer alignment is good! execution 2: device_result_temp.c: 203062d base address 0x7f0d9c000000 Data pointer alignment is good! execution 3: device_result_temp.c: 26ab062d base address 0x7f0d9c000000 Data pointer alignment is good! execution 4: device_result_temp.c: f224462d base address 0x7f0d9c000000 Data pointer alignment is good! execution 5: device_result_temp.c: 6a21462d base address 0x7f0d9c000000 Data pointer alignment is good! execution 6: device_result_temp.c: edec462d base address 0x7f0d9c000000 Data pointer alignment is good! execution 7: device_result_temp.c: 22efe62d base address 0x7f0d9c000000 Data pointer alignment is good! execution 8: device_result_temp.c: c0d4062d base address 0x7f0d9c000000 Data pointer alignment is good! execution 9: device_result_temp.c: 81fa862d base address 0x7f0d9c000000 Data pointer alignment is good! execution 10: device_result_temp.c: 7129e62d checksum Runtime: 1.1418 s result DtoH took 2.1416e-05 s size:104 GPU clocks: 1609585358 Optimal clocks 684800000 Observed 43 % of peak performance Computing checksum on host... 100% Verification on host took 376.599 s verification FAILED! dev: 7129e62d host: 8442a62d

Meanwhile,regardless of how many times the host computes the checksum,the result remains unchanged: Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d Computing checksum on host... 100% host verification result: 8442a62d

mengllm avatar Sep 23 '24 02:09 mengllm