Open3D-ML illegal memory access with PointTransformer in torch

I am getting a RuntimeError: CUDA error: an illegal memory access was encountered when running the PointTransformer. I have tested that on two different systems with nvidia gpu's and got the same error each time.

I am running this in a docker:

FROM nvidia/cuda:11.1.1-devel-ubuntu20.04

RUN apt-get update && \
    apt-get install -yq --no-install-recommends \
        libgl1-mesa-dev \
        python3 \
        python3-dev \
        python3-pip && \
    pip3 install --no-cache-dir --upgrade pip && \
    rm -rf /var/lib/apt/lists/*

RUN pip install open3d numpy matplotlib tensorboard -U

RUN pip install install https://s3.us-west-1.wasabisys.com/open3d-downloads/torch-1.8.2-cp38-cp38-linux_x86_64.whl \
    torchvision==0.9.2+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html

And the code I am running is:

import open3d.ml as _ml3d
import open3d.ml.torch as ml3d
import numpy as np

data = {
    "point": np.random.rand(300000).reshape(-1, 3).astype(np.float32) * 100,
    "feat": np.random.rand(300000).reshape(-1, 3).astype(np.float32) * 255,
    "label": np.zeros(100000, dtype=np.float32)
}

cfg = _ml3d.utils.Config.load_from_file("pointtransformer_s3dis.yml")
model = ml3d.models.PointTransformer(**cfg.model)
pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=None, device="cpu", **cfg.pipeline)
pipeline.load_ckpt(ckpt_path="./logs/pointtransformer_s3dis_202109241350utc.pth")
result = pipeline.run_inference(data)

Here the full error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_11/2004900684.py in <module>
      3 pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=None, device="cpu", **cfg.pipeline)
      4 pipeline.load_ckpt(ckpt_path="./logs/pointtransformer_s3dis_202109241350utc.pth")
----> 5 result = pipeline.run_inference(data)

open3d/_ml3d/torch/pipelines/semantic_segmentation.py in run_inference(self, data)
    165         with torch.no_grad():
    166             for unused_step, inputs in enumerate(infer_loader):
--> 167                 results = model(inputs['data'])
    168                 self.update_tests(infer_sampler, inputs, results)
    169 

torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

open3d/_ml3d/torch/models/point_transformer.py in forward(self, batch)
    173 
    174         for i in range(5):
--> 175             p, f, r = self.encoders[i]([points[i], feats[i], row_splits[i]])
    176             points.append(p)
    177             feats.append(f)

torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

torch/nn/modules/container.py in forward(self, input)
    117     def forward(self, input):
    118         for module in self:
--> 119             input = module(input)
    120         return input
    121 

torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

open3d/_ml3d/torch/models/point_transformer.py in forward(self, pxo)
    642         identity = feat
    643         feat = self.relu(self.bn1(self.linear1(feat)))
--> 644         feat = self.relu(self.bn2(self.transformer2([point, feat, row_splits])))
    645         feat = self.bn3(self.linear3(feat))
    646         feat += identity

torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

open3d/_ml3d/torch/models/point_transformer.py in forward(self, pxo)
    429         feat_q, feat_k, feat_v = self.linear_q(feat), self.linear_k(
    430             feat), self.linear_v(feat)  # (n, c)
--> 431         feat_k = queryandgroup(self.nsample,
    432                                point,
    433                                point,

open3d/_ml3d/torch/models/point_transformer.py in queryandgroup(nsample, points, queries, feat, idx, points_row_splits, queries_row_splits, use_xyz)
    679         queries = points
    680     if idx is None:
--> 681         idx = knn_batch(points,
    682                         queries,
    683                         k=nsample,

open3d/_ml3d/torch/models/point_transformer.py in knn_batch(points, queries, k, points_row_splits, queries_row_splits, return_distances)
    733             -1, k).long().cuda(), ans.neighbors_distance.reshape(-1, k).cuda()
    734     else:
--> 735         return ans.neighbors_index.reshape(-1, k).long().cuda()
    736 
    737 

RuntimeError: CUDA error: an illegal memory access was encountered

Dec 07 '21 17:12 cosama

@cosama There is a Nan value in one of the pointcloud of S3DIS dataset which causes the above problem. It is fixed in #454

Jan 04 '22 16:01 sanskar107

@sanskar107 Thanks for working on this. I will test your solution once I get back to this. I personally would not close issues unless the pull request is merged to master. Hope that happens soon.

Jan 07 '22 19:01 cosama

@sanskar107 I just managed to test this with your branch from #454 and I still get the same illegal memory access error. As you can see in my issue, I actually do not use the S3DIS dataset, just the training checkpoint, but instead load my own point cloud from a ply file.

Unless I do something totally wrong with the data (I do not use any DataLoader, so it might be on the wrong device or something) this issue is still not resolved. In that case can we reopen this please?

In case I do something wrong here with the data, I feel it would enhance the usability of this library considerably if it could be documented how to load custom data and use them. I don't think that many people using this library for their own work are interested in running models on the training datasets.

Happy to help if there is something within my ability that can be done.

Jan 12 '22 16:01 cosama

I updated the description to not require any point cloud anymore, so this should be reproducible on any system without any additional information.

Also if I set device=gpu in the above sample I get this now:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_902918/233953271.py in <module>
     13 pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=None, device="gpu", **cfg.pipeline)
     14 pipeline.load_ckpt(ckpt_path="./logs/pointtransformer_s3dis_202109241350utc.pth")
---> 15 result = pipeline.run_inference(data)

~/Software/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py in run_inference(self, data)
    165         with torch.no_grad():
    166             for unused_step, inputs in enumerate(infer_loader):
--> 167                 results = model(inputs['data'])
    168                 self.update_tests(infer_sampler, inputs, results)
    169 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/Software/Open3D-ML/ml3d/torch/models/point_transformer.py in forward(self, batch)
    173 
    174         for i in range(5):
--> 175             p, f, r = self.encoders[i]([points[i], feats[i], row_splits[i]])
    176             points.append(p)
    177             feats.append(f)

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
    117     def forward(self, input):
    118         for module in self:
--> 119             input = module(input)
    120         return input
    121 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/Software/Open3D-ML/ml3d/torch/models/point_transformer.py in forward(self, pxo)
    534             point, row_splits = new_point, new_row_splits
    535         else:
--> 536             feat = self.relu(self.bn(self.linear(feat)))  # (n, c)
    537         return [point, feat, row_splits]
    538 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input)
     92 
     93     def forward(self, input: Tensor) -> Tensor:
---> 94         return F.linear(input, self.weight, self.bias)
     95 
     96     def extra_repr(self) -> str:

~/.local/lib/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1751     if has_torch_function_variadic(input, weight):
   1752         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1753     return torch._C._nn.linear(input, weight, bias)
   1754 
   1755 

RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)

Not sure what is going on here. Would appreciate if someone could confirm this.

Jan 12 '22 16:01 cosama

Hello @cosama, I was also having the same problem with both device="gpu" and device="cpu".

I managed to solve it for gpu by adding .cuda() in point_transformer.py in the transform() function for data['point'] and data['label'] (lines 300 and 302):

data['point'] = torch.from_numpy(points).to(torch.float32).cuda()
if feat is not None:
     data['feat'] = torch.from_numpy(feat).to(torch.float32).cuda() / 255.0
data['label'] = torch.from_numpy(labels).to(torch.int64)

And to use device="cpu", I had to remove .cuda() in knn_batch() (lines 733 and 735):

if return_distances:
       return ans.neighbors_index.reshape(-1, k).long(), ans.neighbors_distance.reshape(-1, k)
else:
       return ans.neighbors_index.reshape(-1, k).long()

I hope this helps.

Jan 12 '22 18:01 ElMehdi-E

@ElMehdi-E Thank you so much for this.

Yeah, that actually helps a lot. I was worried I was screwing things up horribly here and that confirms it is not just me.

I can confirm that your cpu fix works, but only if there is a nvidia gpu present on the system, otherwise you run into:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_28850/3019627965.py in <module>
     10     print(name)
     11     pipeline, cfg = load_pipeline(cfg_file)
---> 12     result = pipeline.run_inference(data)
     13     labels[name] = result['predict_labels']

~/Work/Software/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py in run_inference(self, data)
    165         with torch.no_grad():
    166             for unused_step, inputs in enumerate(infer_loader):
--> 167                 results = model(inputs['data'])
    168                 self.update_tests(infer_sampler, inputs, results)
    169 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/Work/Software/Open3D-ML/ml3d/torch/models/point_transformer.py in forward(self, batch)
    173 
    174         for i in range(5):
--> 175             p, f, r = self.encoders[i]([points[i], feats[i], row_splits[i]])
    176             points.append(p)
    177             feats.append(f)

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
    117     def forward(self, input):
    118         for module in self:
--> 119             input = module(input)
    120         return input
    121 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/Work/Software/Open3D-ML/ml3d/torch/models/point_transformer.py in forward(self, pxo)
    517             new_row_splits = torch.LongTensor(new_row_splits).to(
    518                 row_splits.device)
--> 519             idx = furthest_point_sample_v2(point, row_splits,
    520                                            new_row_splits)  # (m)
    521             new_point = point[idx.long(), :]  # (m, 3)

~/Work/Software/Open3D-ML/ml3d/torch/utils/pointnet/pointnet2_utils.py in forward(ctx, xyz, row_splits, new_row_splits)
     81         """
     82         if not open3d.core.cuda.device_count() > 0:
---> 83             raise NotImplementedError
     84 
     85         if not xyz.is_contiguous():

NotImplementedError:

I feel if you run this on the cpu you usually might not have a gpu available :smile: . So probably worth fixing that.

If I apply your gpu fix, I run into another problem:

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_903889/1110970400.py in <module>
     12     print(name)
     13     pipeline, cfg = load_pipeline(cfg_file, device=device)
---> 14     result = pipeline.run_inference(data)
     15     print(result)
     16     labels[name] = result['predict_labels']

~/Software/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py in run_inference(self, data)
    175         metric = SemSegMetric()
    176 
--> 177         valid_scores, valid_labels = filter_valid_label(
    178             torch.tensor(inference_result['predict_scores']),
    179             torch.tensor(data['label']), model.cfg.num_classes,

~/Software/Open3D-ML/ml3d/torch/modules/losses/semseg_loss.py in filter_valid_label(scores, labels, num_classes, ignored_label_inds, device)
     17     valid_idx = torch.where(torch.logical_not(ignored_bool))[0].to(device)
     18 
---> 19     valid_scores = torch.gather(valid_scores, 0,
     20                                 valid_idx.unsqueeze(-1).expand(-1, num_classes))
     21     valid_labels = torch.gather(valid_labels, 0, valid_idx)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Not sure what this is about, I didn't manage to fix it by adding cuda to stuff :smile: .

Jan 12 '22 18:01 cosama

Oh my bad, sorry i forgot to mention it but regarding the gpu fix, I did actually change another line in semantic_segmentation.py to get it work (added .to(device) in line 179):

        metric = SemSegMetric()
        valid_scores, valid_labels = filter_valid_label(
            torch.tensor(inference_result['predict_scores']).to(device),
            torch.tensor(data['label']), model.cfg.num_classes,
            model.cfg.ignored_label_inds, device)
        metric.update(valid_scores, valid_labels)
        log.info(f"Accuracy : {metric.acc()}")
        log.info(f"IoU : {metric.iou()}")

The error occurs after inference, in the pipeline evaluation part. Commenting out the above code wouldn't affect the inference process.

Jan 12 '22 21:01 ElMehdi-E

@cosama @ElMehdi-E Did you guys ever figure out the "NotImplementedError: "? I am looking to test this using a computer that does not have a Nvidia GPU 😄

Feb 22 '22 15:02 bernhardpg

@bernhardpg I am looking into this issue, I'll include a fix to solve all the workarounds mentioned above.

Feb 22 '22 16:02 sanskar107

@sanskar107 Great to hear, thank you for the swift reply. I am very excited about the pointtransformer network, so I will be looking forward to the fix. Any estimate on when you think the fix will be available?

Feb 23 '22 14:02 bernhardpg

Open3D-ML Open3D-ML copied to clipboard

illegal memory access with PointTransformer in torch

Open3D-ML
Open3D-ML copied to clipboard