KPConv-PyTorch
KPConv-PyTorch copied to clipboard
Dimension Error
Dear @HuguesTHOMAS ,
first of all, thank you very much for your implementaion of KPConv. I am using the network to train on colored point clouds, 3D reconstructed from drone images.
The training, validation and testing works very well, but as soon as I am setting batch_num=1
I encountered 2 errors:
First one:
Traceback (most recent call last):
File "train_SVGEO.py", line 324, in <module>
trainer.train(net, training_loader, test_loader, config)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/utils/trainer.py", line 200, in train
outputs = net(batch, config)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/models/architectures.py", line 345, in forward
x = block_op(x, batch)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/models/blocks.py", line 636, in forward
x = self.leaky_relu(self.batch_norm_conv(x))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/models/blocks.py", line 457, in forward
x = self.batch_norm(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 178, in forward
self.eps,
File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2279, in batch_norm
_verify_batch_size(input.size())
File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2247, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1])
Second one during validation:
Traceback (most recent call last):
File "train_SVGEO.py", line 324, in <module>
trainer.train(net, training_loader, test_loader, config)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/utils/trainer.py", line 283, in train
self.validation(net, val_loader, config)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/utils/trainer.py", line 299, in validation
self.cloud_segmentation_validation(net, val_loader, config)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/utils/trainer.py", line 487, in cloud_segmentation_validation
outputs = net(batch, config)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/models/architectures.py", line 345, in forward
x = block_op(x, batch)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/models/blocks.py", line 639, in forward
x = self.unary2(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/models/blocks.py", line 494, in forward
x = self.batch_norm(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/user/KPConv-PyTorch/Experiments/KPConv-PyTorch/models/blocks.py", line 455, in forward
x = x.unsqueeze(2)
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
I am curious about what is happening there. In another Issue you mentioned that you recommend training only with batch_num>=3
, so the only reason why I train with one batch is because I want to investigate it's learning behaviour. Training another network with 1 batch per iteration I encountered that the nework is learning nothing. So I wanted to see if KPConv exhibits the same and that it is due to the batch size.
Thanks in advance!
Edit: Both errors occur random in different epochs and iterations each time.
Hi @AlanKoschel,
I think I suspect what is going on here. In the batch norm function, I use a squeeze
function to get rid of unnecessary dimensions. This means that if the input point cloud batch contains only one point, there is a bug as this dimension is squeezed too.
Could you print the dimension of your batch.points
tensors (for each layer). If it happens to be [1, 3] at any layer, then you have your culprit.
There could be a way to fix this squeeze function so there is no more error thrown (using reshape instead). But I don't think it should be corrected, as batch normalization is not supposed to be used on a single element. In your case, I suggest not using batch norm for your experiment, which if I understood is for debugging purpose anyway. See the parameter:
https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/73e444d486cd6cb56122c3dd410e51c734064cfe/train_S3DIS.py#L151-L152
Hi @HuguesTHOMAS , thanks for your detailed explanation, I will check that soon!