HAIS
HAIS copied to clipboard
CUDA kernel failed : an illegal memory access was encountered
Hi! Thanks for the great work.
I'm running HAIS on my own dataset, and have modified class_numpoint_mean_dict and class_radius_mean accordingly. The problem is, I get CUDA kernel failed: an illegal memory access was encountered when testing.
I traced the problem by inserting cudaGetLastError() in hiererchical_aggregation.cu, and found the code breaks around the following lines:
assert(primary_num <= MAX_PRIMARY_NUM);
concat_fragments_<<<primary_num, (int)1>>>(
cuda_fragment_idxs, cuda_fragment_offsets,
cuda_primary_idxs, cuda_primary_offsets,
cuda_primary_absorb_fragment_idx, cuda_primary_absorb_fragment_cnt,
cuda_concat_idxs, cuda_concat_point_num,
primary_num);
cudaDeviceSynchronize();
// got CUDA kernel failed : an illegal memory access was encountered here.
Do you have any suggestions?
It seems a problem of memory overflow. If point clouds of your datasets are much denser or larger than ScanNet, I suggest to increase the values of these macros. https://github.com/hustvl/HAIS/blob/73a6f591ad29e3399ea83e9a669873fe9bb05ab0/lib/hais_ops/src/hierarchical_aggregation/hierarchical_aggregation.cu#L8-L11
Hi, sorry to bother you again.
Each point cloud in my dataset always consists of 20,000 points. I've tried your suggestion out, and have increased/decreased those macros. I found that with different macros, the number of samples passes before failing is actually different! For example, with original settings, I can get 8 samples to pass the test script while increasing the numbers by 16 times, I can only get 3 samples to pass. It breaks after either one of the case (both after sync):
// // for each fragment, find its primary
int *cuda_primary_absorb_fragment_idx; // array for saving the fragment idxs
int *cuda_primary_absorb_fragment_cnt; // array for saving the fragment nums
cudaMalloc((void**)&cuda_primary_absorb_fragment_idx, primary_num * MAX_PER_PRIMARY_ABSORB_FRAGMENT_NUM * sizeof(int) + sizeof(int));
cudaMalloc((void**)&cuda_primary_absorb_fragment_cnt, primary_num * sizeof(int) + sizeof(int));
printf("fragment_num = %d\n", fragment_num); // I've printed out fragment number, which is mostly 10~40, never 0 on my dataset.
if (fragment_num != 0) {
fragment_find_primary_<<<int(DIVUP(fragment_num, MAX_THREADS_PER_BLOCK)), (int)MAX_THREADS_PER_BLOCK>>>(
primary_num, cuda_primary_offsets, cuda_primary_centers,
fragment_num, cuda_fragment_offsets, cuda_fragment_centers,
cuda_primary_absorb_fragment_idx, cuda_primary_absorb_fragment_cnt);
}
err = cudaGetLastError();
if (cudaSuccess != err) {
fprintf(stderr, "before sync CUDA kernel failed : %s\n", cudaGetErrorString(err)); // This never fails in my case.
exit(-1);
}
cudaDeviceSynchronize();
err = cudaGetLastError();
if (cudaSuccess != err) {
fprintf(stderr, " after sync CUDA kernel failed : %s\n", cudaGetErrorString(err)); // However, this fails!
exit(-1);
}
// concatenate fragments belonging to the same primary
int *cuda_concat_idxs;
int *cuda_concat_point_num;
cudaMalloc((void**)&cuda_concat_idxs, primary_num * MAX_PER_PRIMARY_ABSORB_POINT_NUM * 2 * sizeof(int) + sizeof(int));
cudaMalloc((void**)&cuda_concat_point_num, primary_num * sizeof(int) + sizeof(int));
assert(primary_num <= MAX_PRIMARY_NUM);
concat_fragments_<<<primary_num, (int)1>>>(
cuda_fragment_idxs, cuda_fragment_offsets,
cuda_primary_idxs, cuda_primary_offsets,
cuda_primary_absorb_fragment_idx, cuda_primary_absorb_fragment_cnt,
cuda_concat_idxs, cuda_concat_point_num,
primary_num);
err = cudaGetLastError();
if (cudaSuccess != err) {
fprintf(stderr, "before sync 2 CUDA kernel failed : %s\n", cudaGetErrorString(err));
exit(-1);
}
cudaDeviceSynchronize();
err = cudaGetLastError();
if (cudaSuccess != err) {
fprintf(stderr, "after sync 2 CUDA kernel failed : %s\n", cudaGetErrorString(err));
exit(-1);
}
@HelinXu Hi, I have the same problem as you, How did you solve it?
@Dragon-Vae Hi, I am on a different project and have not fully understood this behavior yet. I'll let you know if I figure it out :)
@HelinXu OK, thank you for your reply
@outsidercsy Hi, I also encountered an illegal memory access issue, will #32 be the trick?
@eamonn-zh I encountered the same question,and I found that when I change the batch size to 4 instead of 1,there is no problem. So I think maybe it's not because of memory overflow? And do you figure out this question now?