aanet icon indicating copy to clipboard operation
aanet copied to clipboard

CUDA11

Open trigal opened this issue 3 years ago • 10 comments

Hi, is there any way to make this project work with CUDA11 ?

thx

trigal avatar Mar 10 '21 12:03 trigal

Hi, I haven't tested with CUDA11. I would recommend you to have a try and see what happens.

haofeixu avatar Mar 18 '21 07:03 haofeixu

I tried it on a DGX A100 machine with A100-SXM4-40GB GPUs , using the nvidia docker nvcr.io/nvidia/pytorch:19.10-py3 that should meet the requirements you put in the description, but the problem is that as far I understand these GPUs are not compatible with CUDA10 drivers.

Trying to run the network on updated configurations with CUDA11 the system hangs at https://github.com/haofeixu/aanet/blob/master/predict.py#L87 with the 'to(device)', so I suspect something wrong with the model or, more likely, with the deform_conv package.

trigal avatar Mar 18 '21 09:03 trigal

Have you successfully compiled the deform_conv package?

haofeixu avatar Mar 18 '21 09:03 haofeixu

I'm pretty certain it compiled without errors, but I'll try again next days to report here the compiler output.

trigal avatar Mar 23 '21 09:03 trigal

My GPU's driver is not compatible with CUDA10 just compatible with CUDA11.0,can you succeed with CUDA11.0 for deformable_conv building?

zyl1336110861 avatar Mar 24 '21 13:03 zyl1336110861

I just compiled the deformable_conv module with CUDA11.1, pytorch 1.7.0, python3.7.4, gcc5.5. I encountered the bug firstly with "AT_CHECK is not declared in this scope", so I just change all "AT_CHECK" to "TORCH_CHECK" in the cpp src files according to #11 . This error information is in the middle of the output information of the compile process so be carefule for that information.

zyl1336110861 avatar Mar 27 '21 09:03 zyl1336110861

@haofeixu

zyl1336110861 avatar Mar 27 '21 09:03 zyl1336110861

Thanks @zyl1336110861 for sharing your solution! Hope it can be helpful for others!

haofeixu avatar Apr 03 '21 17:04 haofeixu

I can run successful in single gpu, but when I use multi-gpus, the process will be hang, my cuda version is 11.3, pytorch 1.9.0, python3.8, is there any way fix that? @haofeixu @ all

q5390498 avatar Feb 08 '22 11:02 q5390498

How did you solve it?I didn't find a description for #11.

llllooorange avatar May 13 '22 06:05 llllooorange

Hi all, sorry for the late response.

If this issue is still relavant to you, I would suggest to try our new GMStereo model: https://haofeixu.github.io/unimatch/ & https://github.com/autonomousvision/unimatch. No CUDA op is required. A Colab demo is also provided to try our model in your browser. Hope it helps, thanks.

haofeixu avatar Nov 13 '22 04:11 haofeixu