Open3D-ML
Open3D-ML copied to clipboard
illegal memory access with PointTransformer in torch
I am getting a RuntimeError: CUDA error: an illegal memory access was encountered
when running the PointTransformer
. I have tested that on two different systems with nvidia gpu's and got the same error each time.
I am running this in a docker:
FROM nvidia/cuda:11.1.1-devel-ubuntu20.04
RUN apt-get update && \
apt-get install -yq --no-install-recommends \
libgl1-mesa-dev \
python3 \
python3-dev \
python3-pip && \
pip3 install --no-cache-dir --upgrade pip && \
rm -rf /var/lib/apt/lists/*
RUN pip install open3d numpy matplotlib tensorboard -U
RUN pip install install https://s3.us-west-1.wasabisys.com/open3d-downloads/torch-1.8.2-cp38-cp38-linux_x86_64.whl \
torchvision==0.9.2+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
And the code I am running is:
import open3d.ml as _ml3d
import open3d.ml.torch as ml3d
import numpy as np
data = {
"point": np.random.rand(300000).reshape(-1, 3).astype(np.float32) * 100,
"feat": np.random.rand(300000).reshape(-1, 3).astype(np.float32) * 255,
"label": np.zeros(100000, dtype=np.float32)
}
cfg = _ml3d.utils.Config.load_from_file("pointtransformer_s3dis.yml")
model = ml3d.models.PointTransformer(**cfg.model)
pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=None, device="cpu", **cfg.pipeline)
pipeline.load_ckpt(ckpt_path="./logs/pointtransformer_s3dis_202109241350utc.pth")
result = pipeline.run_inference(data)
Here the full error message:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_11/2004900684.py in <module>
3 pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=None, device="cpu", **cfg.pipeline)
4 pipeline.load_ckpt(ckpt_path="./logs/pointtransformer_s3dis_202109241350utc.pth")
----> 5 result = pipeline.run_inference(data)
open3d/_ml3d/torch/pipelines/semantic_segmentation.py in run_inference(self, data)
165 with torch.no_grad():
166 for unused_step, inputs in enumerate(infer_loader):
--> 167 results = model(inputs['data'])
168 self.update_tests(infer_sampler, inputs, results)
169
torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
open3d/_ml3d/torch/models/point_transformer.py in forward(self, batch)
173
174 for i in range(5):
--> 175 p, f, r = self.encoders[i]([points[i], feats[i], row_splits[i]])
176 points.append(p)
177 feats.append(f)
torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
torch/nn/modules/container.py in forward(self, input)
117 def forward(self, input):
118 for module in self:
--> 119 input = module(input)
120 return input
121
torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
open3d/_ml3d/torch/models/point_transformer.py in forward(self, pxo)
642 identity = feat
643 feat = self.relu(self.bn1(self.linear1(feat)))
--> 644 feat = self.relu(self.bn2(self.transformer2([point, feat, row_splits])))
645 feat = self.bn3(self.linear3(feat))
646 feat += identity
torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
open3d/_ml3d/torch/models/point_transformer.py in forward(self, pxo)
429 feat_q, feat_k, feat_v = self.linear_q(feat), self.linear_k(
430 feat), self.linear_v(feat) # (n, c)
--> 431 feat_k = queryandgroup(self.nsample,
432 point,
433 point,
open3d/_ml3d/torch/models/point_transformer.py in queryandgroup(nsample, points, queries, feat, idx, points_row_splits, queries_row_splits, use_xyz)
679 queries = points
680 if idx is None:
--> 681 idx = knn_batch(points,
682 queries,
683 k=nsample,
open3d/_ml3d/torch/models/point_transformer.py in knn_batch(points, queries, k, points_row_splits, queries_row_splits, return_distances)
733 -1, k).long().cuda(), ans.neighbors_distance.reshape(-1, k).cuda()
734 else:
--> 735 return ans.neighbors_index.reshape(-1, k).long().cuda()
736
737
RuntimeError: CUDA error: an illegal memory access was encountered
@cosama There is a Nan value in one of the pointcloud of S3DIS dataset which causes the above problem. It is fixed in #454
@sanskar107 Thanks for working on this. I will test your solution once I get back to this. I personally would not close issues unless the pull request is merged to master. Hope that happens soon.
@sanskar107 I just managed to test this with your branch from #454 and I still get the same illegal memory access error. As you can see in my issue, I actually do not use the S3DIS dataset, just the training checkpoint, but instead load my own point cloud from a ply
file.
Unless I do something totally wrong with the data (I do not use any DataLoader
, so it might be on the wrong device or something) this issue is still not resolved. In that case can we reopen this please?
In case I do something wrong here with the data, I feel it would enhance the usability of this library considerably if it could be documented how to load custom data and use them. I don't think that many people using this library for their own work are interested in running models on the training datasets.
Happy to help if there is something within my ability that can be done.
I updated the description to not require any point cloud anymore, so this should be reproducible on any system without any additional information.
Also if I set device=gpu
in the above sample I get this now:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_902918/233953271.py in <module>
13 pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=None, device="gpu", **cfg.pipeline)
14 pipeline.load_ckpt(ckpt_path="./logs/pointtransformer_s3dis_202109241350utc.pth")
---> 15 result = pipeline.run_inference(data)
~/Software/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py in run_inference(self, data)
165 with torch.no_grad():
166 for unused_step, inputs in enumerate(infer_loader):
--> 167 results = model(inputs['data'])
168 self.update_tests(infer_sampler, inputs, results)
169
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/Software/Open3D-ML/ml3d/torch/models/point_transformer.py in forward(self, batch)
173
174 for i in range(5):
--> 175 p, f, r = self.encoders[i]([points[i], feats[i], row_splits[i]])
176 points.append(p)
177 feats.append(f)
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/.local/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
117 def forward(self, input):
118 for module in self:
--> 119 input = module(input)
120 return input
121
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/Software/Open3D-ML/ml3d/torch/models/point_transformer.py in forward(self, pxo)
534 point, row_splits = new_point, new_row_splits
535 else:
--> 536 feat = self.relu(self.bn(self.linear(feat))) # (n, c)
537 return [point, feat, row_splits]
538
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input)
92
93 def forward(self, input: Tensor) -> Tensor:
---> 94 return F.linear(input, self.weight, self.bias)
95
96 def extra_repr(self) -> str:
~/.local/lib/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias)
1751 if has_torch_function_variadic(input, weight):
1752 return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1753 return torch._C._nn.linear(input, weight, bias)
1754
1755
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)
Not sure what is going on here. Would appreciate if someone could confirm this.
Hello @cosama, I was also having the same problem with both device="gpu" and device="cpu".
I managed to solve it for gpu by adding .cuda() in point_transformer.py in the transform() function for data['point'] and data['label'] (lines 300 and 302):
data['point'] = torch.from_numpy(points).to(torch.float32).cuda()
if feat is not None:
data['feat'] = torch.from_numpy(feat).to(torch.float32).cuda() / 255.0
data['label'] = torch.from_numpy(labels).to(torch.int64)
And to use device="cpu", I had to remove .cuda() in knn_batch() (lines 733 and 735):
if return_distances:
return ans.neighbors_index.reshape(-1, k).long(), ans.neighbors_distance.reshape(-1, k)
else:
return ans.neighbors_index.reshape(-1, k).long()
I hope this helps.
@ElMehdi-E Thank you so much for this.
Yeah, that actually helps a lot. I was worried I was screwing things up horribly here and that confirms it is not just me.
I can confirm that your cpu
fix works, but only if there is a nvidia gpu present on the system, otherwise you run into:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
/tmp/ipykernel_28850/3019627965.py in <module>
10 print(name)
11 pipeline, cfg = load_pipeline(cfg_file)
---> 12 result = pipeline.run_inference(data)
13 labels[name] = result['predict_labels']
~/Work/Software/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py in run_inference(self, data)
165 with torch.no_grad():
166 for unused_step, inputs in enumerate(infer_loader):
--> 167 results = model(inputs['data'])
168 self.update_tests(infer_sampler, inputs, results)
169
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/Work/Software/Open3D-ML/ml3d/torch/models/point_transformer.py in forward(self, batch)
173
174 for i in range(5):
--> 175 p, f, r = self.encoders[i]([points[i], feats[i], row_splits[i]])
176 points.append(p)
177 feats.append(f)
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/.local/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
117 def forward(self, input):
118 for module in self:
--> 119 input = module(input)
120 return input
121
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/Work/Software/Open3D-ML/ml3d/torch/models/point_transformer.py in forward(self, pxo)
517 new_row_splits = torch.LongTensor(new_row_splits).to(
518 row_splits.device)
--> 519 idx = furthest_point_sample_v2(point, row_splits,
520 new_row_splits) # (m)
521 new_point = point[idx.long(), :] # (m, 3)
~/Work/Software/Open3D-ML/ml3d/torch/utils/pointnet/pointnet2_utils.py in forward(ctx, xyz, row_splits, new_row_splits)
81 """
82 if not open3d.core.cuda.device_count() > 0:
---> 83 raise NotImplementedError
84
85 if not xyz.is_contiguous():
NotImplementedError:
I feel if you run this on the cpu you usually might not have a gpu available :smile: . So probably worth fixing that.
If I apply your gpu
fix, I run into another problem:
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_903889/1110970400.py in <module>
12 print(name)
13 pipeline, cfg = load_pipeline(cfg_file, device=device)
---> 14 result = pipeline.run_inference(data)
15 print(result)
16 labels[name] = result['predict_labels']
~/Software/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py in run_inference(self, data)
175 metric = SemSegMetric()
176
--> 177 valid_scores, valid_labels = filter_valid_label(
178 torch.tensor(inference_result['predict_scores']),
179 torch.tensor(data['label']), model.cfg.num_classes,
~/Software/Open3D-ML/ml3d/torch/modules/losses/semseg_loss.py in filter_valid_label(scores, labels, num_classes, ignored_label_inds, device)
17 valid_idx = torch.where(torch.logical_not(ignored_bool))[0].to(device)
18
---> 19 valid_scores = torch.gather(valid_scores, 0,
20 valid_idx.unsqueeze(-1).expand(-1, num_classes))
21 valid_labels = torch.gather(valid_labels, 0, valid_idx)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Not sure what this is about, I didn't manage to fix it by adding cuda
to stuff :smile: .
Oh my bad, sorry i forgot to mention it but regarding the gpu
fix, I did actually change another line in semantic_segmentation.py to get it work (added .to(device) in line 179):
metric = SemSegMetric()
valid_scores, valid_labels = filter_valid_label(
torch.tensor(inference_result['predict_scores']).to(device),
torch.tensor(data['label']), model.cfg.num_classes,
model.cfg.ignored_label_inds, device)
metric.update(valid_scores, valid_labels)
log.info(f"Accuracy : {metric.acc()}")
log.info(f"IoU : {metric.iou()}")
The error occurs after inference, in the pipeline evaluation part. Commenting out the above code wouldn't affect the inference process.
@cosama @ElMehdi-E Did you guys ever figure out the "NotImplementedError: "? I am looking to test this using a computer that does not have a Nvidia GPU 😄
@bernhardpg I am looking into this issue, I'll include a fix to solve all the workarounds mentioned above.
@sanskar107 Great to hear, thank you for the swift reply. I am very excited about the pointtransformer network, so I will be looking forward to the fix. Any estimate on when you think the fix will be available?