Self-supervised-Monocular-Trained-Depth-Estimation-using-Self-attention-and-Discrete-Disparity-Volum icon indicating copy to clipboard operation
Self-supervised-Monocular-Trained-Depth-Estimation-using-Self-attention-and-Discrete-Disparity-Volum copied to clipboard

Cuda 11 Compatibility

Open jamesheatonrdm opened this issue 4 years ago • 3 comments

I am wanting to train a model using my GPU, it is a GeForce 3060, which is only compatible with CUDA 11 and above.

If I use the required CUDA (10.1) and Pytorch (0.4.1) versions as specified, I cannot send anything to the GPU. Calling .to(device) causes python to hang for multiple minutes and then results in the error CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

If I try to use the required version of PyTorch for my CUDA version by following the instructions here: https://pytorch.org/get-started/locally/ when I try to run the trainer using this version I get: ninja: build stopped: subcommand failed

This is because of multiple errors in the c++ files due to deprecation. e.g. : /home/james/ML/Projects/Self-supervised-Monocular-Trained-Depth-Estimation-using-Self-attention-and-Discrete-Disparity-Volum/venv/lib/python3.6/site-packages/torch/include/ATen/Functions.h:467:22: note: no known conversion for argument 1 from ‘at::DeprecatedTypeProperties’ to ‘c10::IntArrayRef {aka c10::ArrayRef<long int>}’

My questions are as follows:

Is it possible to build the required version of pytorch for this project (0.4.1) with CUDA11?

If not, is the only way to fix this just go through and fix all of the deprecation errors?

Any help is appreciated.

jamesheatonrdm avatar Oct 14 '21 08:10 jamesheatonrdm

One more thing, I can get it working using just the CPU specifying the --no_cuda option however there appears to be a bug at line 480 in trainer.py

identity_reprojection_loss.shape).cuda() * 0.00001

This throws an error when training with the --no_cuda option specified.

jamesheatonrdm avatar Oct 14 '21 12:10 jamesheatonrdm

hi, I met the same error that I try to train it using 3090. Did you find any solution?

panxkun avatar Nov 28 '21 03:11 panxkun

I had the same problem not running code in the configuration provided by the author and PyTorch 0.4.1 did not work on CU 10. As far as I know, the highest support is CU 9.2.

takisu0916 avatar Jun 25 '22 02:06 takisu0916