MinkowskiEngine icon indicating copy to clipboard operation
MinkowskiEngine copied to clipboard

What is Voxel Size and how to choose the correct number?

Open zhaopku opened this issue 4 years ago • 3 comments
trafficstars

In the semantic segmentation example indoor.py, line 138, there is a hyperparameter called voxel_size. The original number of 0.02, and I have tried different numbers for that. It looks like voxel_size has a significant affect on the model performance. So what is this number and how to choose a correct number for that?

zhaopku avatar Jun 27 '21 23:06 zhaopku

Voxel size determines the resolution of the space.

Let's say that we have a 100m x 100m x 25m LIDAR scan. If we select voxel size to be

voxel size resolution
1m 100 x 100 x 25
50cm 200 x 200 x 50
5cm 2000 x 2000 x 500

The network can see all the details if you use small voxel size, but will be slower accordingly. This is the same for 2D cnns you are familiar with, high resolution images require more computation.

Similarly, you can't expect a 2D CNN trained on 100x100 images to work well on 10x10 images or 1000x1000 images at test time. You have to train with the specified resolution / voxel size to use that in the test.

chrischoy avatar Jul 01 '21 18:07 chrischoy

Thanks for your explanation. I'm trying to understand this a bit more clearly:

def create_input_batch(batch, is_minknet, device="cuda", quantization_size=0.05):
    if is_minknet:
        print("pre", batch["coordinates"][:, 1:])
        print("pre", batch["coordinates"][:, 1:].shape)
        batch["coordinates"][:, 1:] = batch["coordinates"][:, 1:] / quantization_size
        print('post', batch["coordinates"][:, 1:])
        print('post', batch["coordinates"][:, 1:].shape)
        return ME.TensorField(
            coordinates=batch["coordinates"],
            features=batch["features"],
            device=device,
        )
    else:
        return batch["coordinates"].permute(0, 2, 1).to(device)

here is the output with quantization_size = 0.5

pre tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  1.],
        [ 0.,  0.,  2.],
        ...,
        [49., 49., 47.],
        [49., 49., 48.],
        [49., 49., 49.]])
pre torch.Size([1000000, 3])
post tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  2.],
        [ 0.,  0.,  4.],
        ...,
        [98., 98., 94.],
        [98., 98., 96.],
        [98., 98., 98.]])
post torch.Size([1000000, 3])

so voxel_size = 0.5 does not impact the num_points in a batch.

But it has changed the coords from maximum (x,y,z) locations from 49,49,49 to 98,98,98.

It's confusing because before I had (8, 50x50x50) = 1000000 points and after quantization, I still have the same 1000000 points.

Does that mean that we are now sampling 1000000 points from high-resolution 3D data?

asadabbas09 avatar Jul 09 '21 07:07 asadabbas09

The reason you got the same number is because you used TensorField. This is a wrapper for continuous point cloud. You can use .sparse() to convert TensorField to SparseTensor which will show you unique number of coordinates.

chrischoy avatar Jul 10 '21 04:07 chrischoy