Open3D-ML Cannot run pipeline with GPU but works with CPU

Checklist

[X] I have searched for similar issues.
[X] I have tested with the latest development wheel.
[X] I have checked the release documentation and the latest documentation (for master branch).

Describe the issue

I've been testing Open3D ML pretrained models before I set up a configuration for a custom data set.

I am trying to do this by running the predefined scripts.

The CPU works but GPU is giving me an error: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

Steps to reproduce the bug

$ python scripts/run_pipeline.py torch -c ml3d/configs/randlanet_semantickitti.yml --dataset.dataset_path /home/alex/Desktop/Datasets/SemanticKITTI --pipeline SemanticSegmentation --dataset.use_cache True --split test 

Using external Open3D-ML in /home/alex/Desktop/NIST_T3/PROJECT/Open3D-ML
regular arguments
batch_size: null
cfg_dataset: null
cfg_file: ml3d/configs/randlanet_semantickitti.yml
cfg_model: null
cfg_pipeline: null
ckpt_path: null
dataset: null
dataset_path: null
device: gpu
framework: torch
main_log_dir: null
max_epochs: null
mode: null
model: null
pipeline: SemanticSegmentation
seed: 0
split: test

extra arguments
dataset.dataset_path: /home/alex/Desktop/Datasets/SemanticKITTI
dataset.use_cache: 'True'
pipeline.num_workers: '0'

INFO - 2022-04-26 18:17:20,184 - semantic_segmentation - DEVICE : cuda
INFO - 2022-04-26 18:17:20,184 - semantic_segmentation - Logging in file : ./logs/RandLANet_SemanticKITTI_torch/log_test_2022-04-26_18:17:20.txt
INFO - 2022-04-26 18:17:20,222 - semantickitti - Found 20351 pointclouds for test
INFO - 2022-04-26 18:18:40,823 - semantic_segmentation - Initializing from scratch.
INFO - 2022-04-26 18:18:40,825 - semantic_segmentation - Started testing

Error message

Traceback (most recent call last): File "/home/alex/Desktop/NIST_T3/PROJECT/Open3D-ML/scripts/run_pipeline.py", line 163, in main() File "/home/alex/Desktop/NIST_T3/PROJECT/Open3D-ML/scripts/run_pipeline.py", line 151, in main pipeline.run_test() File "/home/alex/Desktop/NIST_T3/PROJECT/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py", line 233, in run_test results = model(inputs['data']) File "/home/alex/anaconda3/envs/o3Dml9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/alex/Desktop/NIST_T3/PROJECT/Open3D-ML/ml3d/torch/models/randlanet.py", line 266, in forward feat = self.fc0(feat).transpose(-2, -1).unsqueeze( File "/home/alex/anaconda3/envs/o3Dml9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/alex/anaconda3/envs/o3Dml9/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 94, in forward return F.linear(input, self.weight, self.bias) File "/home/alex/anaconda3/envs/o3Dml9/lib/python3.9/site-packages/torch/nn/functional.py", line 1753, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

Expected behavior

When --device is set to CPU everything seems to work, however when set to gpu or cuda I get the error.

Open3D, Python and System information

- Operating system: Ubuntu 20.04
- Python version: 3.9
- Open3D version: (output from python: `print(open3d.__version__)`)
- System type: x84
- Is this remote workstation?: yes or no
- How did you install Open3D?: pip
- Compiler version (if built from source): gcc 7.5

Additional information

I've tried with CUDA 11.6 and 10.1 and got the same error This error pops up with different datasets as well (SemanticKITTI and Stanford3D)

Any tips or ideas would be great, thank you!

Apr 27 '22 00:04 andimo11

Same here,

CUDA 11.3 failed, but CUDA 10.2 ok

Jul 07 '22 08:07 conby

@conby and @andimo11 Any update on this?

The same problem here with Cuda Version: 11.0 and Torch Version: 1.8.2. It works on CPU but not CPU. I used requirements-torch-cuda.txt to install a compatible version of torch and cuda

- Operating system: Ubuntu 20.04
- Python version: 3.9.5
- Open3D version: 0.15.2
- Is this remote workstation?: no
- How did you install Open3D?: pip

Jul 21 '22 15:07 shayannikoohemat

All,

I am also having the same problem. Checked with Cuda Version: 10.2, 11.1, 11.3, 11.7 and Torch Version: 1.8.2. Please note that it perfectly works on CPU but not GPU (Tesla V100 - 32GB). I have four of them but currently using 1 GPU by,

os.environ["CUDA_DEVICE_ORDER"]     = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]  = "0"

Update:

Please note that strangely it works well with RTX 2080Ti single GPU only machine with Cuda 10.2, Ubuntu 22.04 upgraded from 18.04 and 20.04. In addition, I tested on a 4 Nvidia Titan X machine with Cuda 10.2 and upgraded Ubuntu 22.04 it works flawlessly.

I used requirements-torch-cuda.txt to install a compatible version of torch and cuda given by Team Open3d-ML

Operating system: Ubuntu 22.04
Python version: 3.8.13
GCC version: 7.5.0
Open3D version: 0.15.2
Is this remote workstation?: Yes
How did you install Open3D?: pip

I have been installing/uninstalling all possible drivers, CUDA toolkit, and cuDNN for the last 3 days. But can't fix the issue, @andimo11, @conby and @shayannikoohemat please let me know if you have any suggestions.

@yxlao please help us to fix this issue ASAP. I cannot use my Tesla V100 GPUs at all.

Looking forward for any swift help!

Sep 07 '22 21:09 preethamam

Hi there, I tried randlanet_semantickitti on several setups. Here is the result:

	Video card	Nvidia driver	CUDA	CuDNN	Works?
local	TITAN X (Pascal) 12GB	470.141.03	10.2, 11.1, 11.4	8	Yes
local	GeForce RTX 2080 Ti 12GB	470.141.03	11.4	8	Yes
local	GeForce RTX 3080 Laptop 16GB	510.85.02	11.1	8	Yes
AWS p3.2xlarge	Tesla V100-SXM2-16GB	470.57.02	10.2, 11.1, 11.4	8 and w/o	No
AWS p3.2xlarge	Tesla V100-SXM2-16GB	510.73.08	11.1	8	No
AWS p2.xlarge	Tesla K80 12GB	470.141.03	11.1	8	Yes
AWS g4dn.2xlarge	Tesla T4 16GB	510.73.08	11.1	8	Yes

From this table, it looks like Tesla V100 never works, while everything else works. This also matches @preethamam's experience.

@andimo11, @conby, @shayannikoohemat, what are your video cards?

Despite also running inference (--split test) on the semantic segmentation pipeline with randlanet model and torch backend, the abusing line is different for me (compare this to what @andimo11 got).

  ...
  File "/deepmap_workspace/Open3D-ML/ml3d/torch/models/randlanet.py", line 636, in forward
    scores = self.score_fn(x.permute(0, 2, 3, 1)).permute(0, 3, 1, 2)
  ...
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Oct 12 '22 20:10 kukuruza

I finally solved the issue by replacing the torch / torchvision packages pre-build by the Open3D team with the ones from public repositories. That is, replace the content of requirements-torch-cuda.txt with:

torch==1.8.1
torchvision==0.9.1
tensorboard

Open3D will throw a warning "Using the Open3D PyTorch ops with CUDA 11 may have stability issues!", and ask to compile torch with certain flags. But I don't get any problems for my case, so I just ignore the warning.

Additionally Open3D says "Warning: Open3D was built with CUDA 11.0 butPyTorch was built with CUDA 10.2. Falling back to CPU for now.Otherwise, install PyTorch with CUDA 11.0." But I can see via nvidia-smi that GPU is actually used and it is considerably faster than if I disable GPU. So not sure if this warning really means anything.

The solution works for V100 with CUDA 11.1.1 and 11.7.1.

Oct 13 '22 15:10 kukuruza

More curious developments. Open3D says it is compiled with CUDA 11.0, so I thought that Pytorch compiled with CUDA 11 should be better, but it is not.

Tesla V100 Ubuntu 18.04 CUDA 11.1 python 3.6

The following requirements-torch-cuda.txt file installs Pytorch compiled with CUDA 10.2. You can verify that by running python3 -c "import torch; print(torch.version.cuda)". This setup works.

torch==1.8.1
torchvision==0.9.1
tensorboard

The requirements-torch-cuda.txt file below installs Pytorch compiled with CUDA 11.1, which I expected to match better Open3D compiled with CUDA 11.0. However, this results in the CUBLAS_STATUS_EXECUTION_FAILED error. So Pytorch compiled with CUDA 11.1 is the problem, no matter whether it is built by the Open3D team or from the official repos.

-f https://download.pytorch.org/whl/torch_stable.html
torch==1.8.1+cu111
torchvision==0.9.1+cu111
tensorboard

Oct 14 '22 15:10 kukuruza

@kukuruza The GPU that I tested was: NVIDIA GeForce GTX 1650

Nov 28 '22 09:11 shayannikoohemat

Open3D-ML Open3D-ML copied to clipboard

Cannot run pipeline with GPU but works with CPU

Checklist

Describe the issue

Steps to reproduce the bug

Error message

Expected behavior

Open3D, Python and System information

Additional information

Open3D-ML
Open3D-ML copied to clipboard