ICON icon indicating copy to clipboard operation
ICON copied to clipboard

A weird bug: The data is not on the same device.

Open caiyongqi opened this issue 1 year ago • 11 comments

https://github.com/YuliangXiu/ICON/blob/ece5a09aa2d56aec28017430e65a0352622a0f30/lib/dataset/mesh_util.py#L283

` print(triangles.device) # cuda:1

print(points.device) # cuda:1

residues, pts_ind, _ = point_to_mesh_distance(points, triangles)

print(triangles.device) # cuda:1

print(pts_ind.device) # cuda:0

print(residues.device) # cuda:0`

command: python -m apps.infer -cfg ./configs/icon-filter.yaml -gpu 1 -in_dir {*} -out_dir {*} 'CUDA_VISIBLE_DEVICES=1' doesn't work either.

caiyongqi avatar Jul 16 '22 05:07 caiyongqi

Just pushed a fix on Kaolin, let me know if that resolve the issue.

Caenorst avatar Jul 16 '22 16:07 Caenorst

Just pushed a fix on Kaolin, let me know if that resolve the issue.

Hi, I tested this function and now its input and output data are on the same device.

caiyongqi avatar Jul 16 '22 17:07 caiyongqi

But the ICON would get stuck, I'm not sure if this is caused by kaolin, I'm checking. image

gpu 1: image image gpu 0: It's normal. image

caiyongqi avatar Jul 16 '22 17:07 caiyongqi

In the first screenshot, you still have residues and pts_ind on different devices than triangles?

Caenorst avatar Jul 16 '22 17:07 Caenorst

In the first screenshot, you still have residues and pts_ind on different devices than triangles?

No, they are on the same devices.

caiyongqi avatar Jul 16 '22 17:07 caiyongqi

You can add some log around just to identify where it is hanging, let me know if a function in Kaolin is the culprit and I'll happily address that

Caenorst avatar Jul 16 '22 17:07 Caenorst

You can add some log around just to identify where it is hanging, let me know if a function in Kaolin is the culprit and I'll happily address that It hangs here: https://github.com/YuliangXiu/ICON/blob/46c76d70e99825a00a7818b364f8832d3094203f/lib/common/seg3d_lossless.py#L599 Seems to be caused by kaolin.ops.conversions.voxelgrids_to_trianglemeshes().

caiyongqi avatar Jul 16 '22 20:07 caiyongqi

I see from the code where this might crash, I will address that in the coming week

Caenorst avatar Jul 16 '22 20:07 Caenorst

@Caenorst @caiyongqi Any update on this issue?

YuliangXiu avatar Jul 30 '22 15:07 YuliangXiu

You can add some log around just to identify where it is hanging, let me know if a function in Kaolin is the culprit and I'll happily address that It hangs here: https://github.com/YuliangXiu/ICON/blob/46c76d70e99825a00a7818b364f8832d3094203f/lib/common/seg3d_lossless.py#L599

Seems to be caused by kaolin.ops.conversions.voxelgrids_to_trianglemeshes().

hi, I have the same error, the error is caused by kaolin.ops.conversions.voxelgrids_to_trianglemeshes(). And have you solved this problem? image

myccver avatar Oct 25 '22 21:10 myccver

https://github.com/YuliangXiu/ICON/blob/ece5a09aa2d56aec28017430e65a0352622a0f30/lib/dataset/mesh_util.py#L283

` print(triangles.device) # cuda:1

print(points.device) # cuda:1

residues, pts_ind, _ = point_to_mesh_distance(points, triangles)

print(triangles.device) # cuda:1

print(pts_ind.device) # cuda:0

print(residues.device) # cuda:0`

command: python -m apps.infer -cfg ./configs/icon-filter.yaml -gpu 1 -in_dir {*} -out_dir {*} 'CUDA_VISIBLE_DEVICES=1' doesn't work either.

I meet the same error when I run the demo, after I set -gpu 0 in command, 'CUDA_VISIBLE_DEVICES=1', it works.

myccver avatar Oct 25 '22 21:10 myccver