MinkowskiEngine
MinkowskiEngine copied to clipboard
What is Voxel Size and how to choose the correct number?
In the semantic segmentation example indoor.py, line 138, there is a hyperparameter called voxel_size. The original number of 0.02, and I have tried different numbers for that. It looks like voxel_size has a significant affect on the model performance. So what is this number and how to choose a correct number for that?
Voxel size determines the resolution of the space.
Let's say that we have a 100m x 100m x 25m LIDAR scan. If we select voxel size to be
| voxel size | resolution |
|---|---|
| 1m | 100 x 100 x 25 |
| 50cm | 200 x 200 x 50 |
| 5cm | 2000 x 2000 x 500 |
The network can see all the details if you use small voxel size, but will be slower accordingly. This is the same for 2D cnns you are familiar with, high resolution images require more computation.
Similarly, you can't expect a 2D CNN trained on 100x100 images to work well on 10x10 images or 1000x1000 images at test time. You have to train with the specified resolution / voxel size to use that in the test.
Thanks for your explanation. I'm trying to understand this a bit more clearly:
def create_input_batch(batch, is_minknet, device="cuda", quantization_size=0.05):
if is_minknet:
print("pre", batch["coordinates"][:, 1:])
print("pre", batch["coordinates"][:, 1:].shape)
batch["coordinates"][:, 1:] = batch["coordinates"][:, 1:] / quantization_size
print('post', batch["coordinates"][:, 1:])
print('post', batch["coordinates"][:, 1:].shape)
return ME.TensorField(
coordinates=batch["coordinates"],
features=batch["features"],
device=device,
)
else:
return batch["coordinates"].permute(0, 2, 1).to(device)
here is the output with quantization_size = 0.5
pre tensor([[ 0., 0., 0.],
[ 0., 0., 1.],
[ 0., 0., 2.],
...,
[49., 49., 47.],
[49., 49., 48.],
[49., 49., 49.]])
pre torch.Size([1000000, 3])
post tensor([[ 0., 0., 0.],
[ 0., 0., 2.],
[ 0., 0., 4.],
...,
[98., 98., 94.],
[98., 98., 96.],
[98., 98., 98.]])
post torch.Size([1000000, 3])
so voxel_size = 0.5 does not impact the num_points in a batch.
But it has changed the coords from maximum (x,y,z) locations from 49,49,49 to 98,98,98.
It's confusing because before I had (8, 50x50x50) = 1000000 points and after quantization, I still have the same 1000000 points.
Does that mean that we are now sampling 1000000 points from high-resolution 3D data?
The reason you got the same number is because you used TensorField. This is a wrapper for continuous point cloud.
You can use .sparse() to convert TensorField to SparseTensor which will show you unique number of coordinates.