KPConv-PyTorch
KPConv-PyTorch copied to clipboard
Dimensionality, Potential-based sampling, input spheres and batch_neighbors in Classification
Hi Hugues,
first of all, thank you for releasing the PyTorch Version of your code and especially for the link to your PhD thesis in one of the other issues.
- I have one question with some context about the data loading, sampling (1) and a follow up question (2):
[Context] From my understanding, the ModelNet40Dataset class loads the entire (subsampled) dataset at init in the attributes self.input_points etc. This means, each list entry in self.input_points is the entire (point) data of one point cloud as numpy array? Now, self.potential has the dimensionality len(dataset.input_labels), which means it is has as many entries as there are pointcloud files in the data set. Is that still correct? epoch_n is the number of models used in one epoch, so epoch steps times batch number, or the entire dataset if smaller?.
[Question] If all of the above are true, what exactly is the purpose of gen_indices = np.argpartition(self.potentials, self.dataset.epoch_n)[:self.dataset.epoch_n] in the sampler class? It seems like it does a smallest potential regular sampling like on pp. 127 in your thesis, but with entire pointclouds instead of points within one point cloud? Are gen_indices the indices to entire point_clouds as I thought?
I have another more general question about the ModelNet40Batch and the dimensionality of the input_list: I suspect that the input_list to the batch class is the returned list from ModelNet40Dataset getitem. If so, why is there a layer number L (comment says number of layers), that can be computed from the input_list. That contradicts some of my assessments from question 1, which means I got something wrong in my understanding.
Finally, a question regarding the theory: As far as I understood, the kernel point convolution operation is applied to each input point to the layer and consideres is spherical neighborhood in the computation of the kernel function (g in your paper). All neighborhood relationships are saved in a neighbor-matrix. Furthermore, each KPConv layer has exactly one kernel point "arrangement". This means each (subsampled) input point to one KPConv layer uses the same one kernel point arrangement of that layer and the individual input point's neighborhood to compute an Output?
If the above is correct, why does batch_neighbor() computation makes sense? Wouldn't that mean that the spherical neighbors of one point could reside in another point cloud of the same batch? I have the feeling that this has something to do with the "shadow neighbors" in your thesis. Unfortunatelly, I cant entirely make sense of it.
One minor clarification: what exactly are support points? From your paper, I assumed that these are the barycenter points of the voxels from grid subsampling. I found one thesis discussion video, where they said that support points are just another name for kernel points? From your code, I think the first intepretation (barycenters) is correct? Could you clarify?
Is it possible to get your PhD thesis somewhere in a format that supports ctrl+f and marking of text parts? That would be awesome.
Sry for that very long questionaire. If you want, I can split it in different issues.
Great thank you for your help and your work!
Best regards
Hi @TobiasMascetta,
Thank you for your interest in my work and the nice comments. here are some answers.
[Context] From my understanding, the ModelNet40Dataset class loads the entire (subsampled) dataset at init in the attributes self.input_points etc. This means, each list entry in self.input_points is the entire (point) data of one point cloud as numpy array? Now, self.potential has the dimensionality len(dataset.input_labels), which means it is has as many entries as there are pointcloud files in the data set. Is that still correct? epoch_n is the number of models used in one epoch, so epoch steps times batch number, or the entire dataset if smaller?.
Yes.
what exactly is the purpose of gen_indices = np.argpartition(self.potentials, self.dataset.epoch_n)[:self.dataset.epoch_n] in the sampler class? It seems like it does a smallest potential regular sampling like on pp. 127 in your thesis, but with entire pointclouds instead of points within one point cloud? Are gen_indices the indices to entire point_clouds as I thought?
Yes. So the idea is that I wanted a random sampling that samples each model the same number of time, even if the epoch is smaller than the total number of models. Potential was a simple solution to implement that. Another solution would have been to get a random permutation of the dataset, and then whenever we reach the end, resample a new random permutation, etc.
I suspect that the input_list to the batch class is the returned list from ModelNet40Dataset getitem.
Yes that's right.
If so, why is there a layer number L (comment says number of layers)
The input_list
returned by getitem has always the same elements:
- the point clouds of each layer.
- the neighbor indices of each layer.
- the pooling indices from each layer to the next
- the upsampling indices from each layer to the previous one.
- the input features (at first layer)
- the input labels (at first layer)
- a bunch of other variables...
As you can see, for the points, neighbor, poolings, and upsamplings, there are more than just one element, there are actually as many elements in the list as there are different layers in the network. So in total, the list has 4 times the number of layers + 5 elements, which is why we can get the number of layers from the list with: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/1defcd75cf7c0399704a6a9f63d3a550bfb8c1c9/datasets/ModelNet40.py#L689-L690
Finally, a question regarding the theory: As far as I understood, the kernel point convolution operation is applied to each input point to the layer and consideres is spherical neighborhood in the computation of the kernel function (g in your paper). All neighborhood relationships are saved in a neighbor-matrix. Furthermore, each KPConv layer has exactly one kernel point "arrangement". This means each (subsampled) input point to one KPConv layer uses the same one kernel point arrangement of that layer and the individual input point's neighborhood to compute an Output?
Yes.
If the above is correct, why does batch_neighbor() computation makes sense? Wouldn't that mean that the spherical neighbors of one point could reside in another point cloud of the same batch? I have the feeling that this has something to do with the "shadow neighbors" in your thesis. Unfortunatelly, I cant entirely make sense of it.
No, neigbors from one cloud cannot reside in another, we make sure of that in batch_neighbor(). The neigbhors from each cloud in computed separately, and then we stack everything and take care of offsetting the neigbor indices by the right value so that they still point to the new position of the points in the whole stacked batch structure. Let's say that their are 3 point clouds in our batch like this:
| cloud1 | cloud2 | cloud 3 |
points = [p0, p1, ..., pi, pi+1, ..., pj, pj+1, ..., pk] (pX is a point in R3)
| neighb1 | neighb2 | neighb3 |
neigbors = [n0, n1, ..., ni, ni+1, ..., nj, nj+1, ..., nk] (nX in a vector of indices in N)
In that case
- the neighbors
n1, n2, ..., ni
can only have values between0
andi
, and thus be neighbors from cloud1 - the neighbors
ni+1, ..., nj
can only have values betweeni+1
andj
, and thus be neighbors from cloud2 - the neighbors
nj+1, ..., nk
can only have values betweenj+1
andk
, and thus be neighbors from cloud3
"shadow neighbors" is something else. Spherical neighborhood do not all have the same number of neighbors, so these vectors nX
should be of different length. To be usable as a matrix on GPU, we have to fix the size to a maximum value maxN
. Thus if a neigbhorhood nX
have less than maxN
valid neighbors, we fill the rest of the line in the matrix with the value k+1
, which does not exists. We call that shadow neighbors.
what exactly are support points? From your paper, I assumed that these are the barycenter points of the voxels from grid subsampling. I found one thesis discussion video, where they said that support points are just another name for kernel points? From your code, I think the first intepretation (barycenters) is correct? Could you clarify?
Generally speaking, we differentiate query points and support points. When computing neighbors, the query points are the positions where you are seeking the number of neighbors and the support points are the points you are searching from. In KPConv, we use the same denomination. For example, when going from one layer to another, the support points are the points of the current layer, and the query points are the points of the next layer. For a convolution block within a layer, the support points and the query points are the same, they are the points of the layer. In all cases, these are barycenter points from the grid subsampling because this is how we get the points from all layers. However, they are never another name for kernel points.
- Here is a link to my word file. Hope that works for you
Best, Hugues
Hi Hugues,
thank you for your fast response and the word doc, that is exactly what I needed!
I would have some follow-up questions regarding segmentation and input spheres vs neighborhoods (with S3DIS.py as an example).
[Context] From my understanding, there are two types of spheres. Firstly, the input spheres and secondly the spherical neighborhoods. The input spheres are the KPConv version of a (cropped) input image to a layer from standard CNN and the neighborhoods are the KPConv version of standard 3x3-Kernel Window neighbors in a standard convolutional layer. Each input sphere therefore itself has several spherical neighborhoods. Is all of that correct?
Segmentation: The center of an input sphere is found via potential-based regular sampling (pp. 127 from your PhD) and the input sphere itself is created via KDTree query_radius, where all points within the a radius (config.in_radius) are found. These two steps are done in the potential_item() method. The neighborhood sphere is indirectly created in common.py classification_inputs() or segmentation_inputs() via the neighbor-matrix from batch_neighors cpp functions. They use another radius r_normal.
Classification: In classification, KPConv just scales the entire object into a sphere as input_sphere. It is indirectly done via handing the entire subsampled pointcloud to batch_neighbors().
Is all of that correct too?
[Questions]
-
If all of the above is correct what is the reason for actually using specifically spherical inputs. I understand the reason for spherical neighborhoods in the context of KP Convolutions but what is the reasoning behind spherical inputs and not for example cubical inputs.
-
In your PhD thesis IV.2.8.c (Scene Segmentation) you wrote "The spheres containing the input subclouds are smaller than the whole dataset but large enough to cover several objects". I am not quite sure how to put that together with everything I wrote above. I thought each input sphere only contains the data from one pointcloud file. Did I just misinterpret what you meant with dataset in the sentence or did I get something wrong with my understanding?
-
The radius of the input_sphere and of the neighborhoods are at least theoretically only dependent in the sense, that the neighborhoods of one input_sphere should not have a larger radius as their input sphere itself? Are there other theoretical dependencies or is the rest from practical considerations or testing?
-
In segmentation, the potentials (center of input spheres) are found via the list pot_trees. For me, it seems that each KDTree in pot_trees is a subsampled version of its corresponding KDTree in input_tree. But each tree in input_tree is already generated from a subsampled cloud file. Is this double-subsampling for better computation time only or is there another reason I am missing?
-
In segmentation, although it is unlikey, it is theoretically possible that one batch contains more than one input sphere from the same original point cloud file. Is that correct?
Thank you very much again for your help and especially still being active here answering questions and issues!
Best regards
Is all of that correct too?
Yes all this is correct.
- If all of the above is correct what is the reason for actually using specifically spherical inputs. I understand the reason for spherical neighborhoods in the context of KP Convolutions but what is the reasoning behind spherical inputs and not for example cubical inputs.
I could ask the same for cubicle inputs? Is there a good reason to choose cubes and not spheres? When I thought of this I compared both and decided the spheres would be a better representation. I never compared but I don't think it matters too much which one you choose. I guess the absence of corners in spheres made me think that mistakes at the border of input regions would be limited compared to squares.
- In your PhD thesis IV.2.8.c (Scene Segmentation) you wrote "The spheres containing the input subclouds are smaller than the whole dataset but large enough to cover several objects". I am not quite sure how to put that together with everything I wrote above. I thought each input sphere only contains the data from one pointcloud file. Did I just misinterpret what you meant with dataset in the sentence or did I get something wrong with my understanding?
You can replace "dataset" by "scene" or "point cloud of a dataset" in this sentence. What I mean is that a scene of a dataset typically can cover an area up to hundreds of meters, an input sphere is sampled in this scene and would only cover a few meters, which is a lot smaller than the whole scene, but still enough to contain several objects.
3.The radius of the input_sphere and of the neighborhoods are at least theoretically only dependent in the sense, that the neighborhoods of one input_sphere should not have a larger radius as their input sphere itself? Are there other theoretical dependencies or is the rest from practical considerations or testing?
Well if you really want, in theory, you could even have a larger neighborhood_radius than input_radius, but it would not make any sense, it would be like having an image with a single pixel and trying to do deep learning with it. There is no theoretical limits, only practical considerations, which are related to the third variable subsampling_dl:
- the ratio input_radius / subsampling_dl controls the number of points in the sphere, it corresponds to the number of pixels along one dimension in an image (except you have to think that the image is round and not rectangular)
- the ratio neighborhood_radius / subsampling_dl controls the number of points in a convolution neighborhood, similarly to the size of a 3x3 or 7x7 kernel in images. Again you have to take into consideration that input points are subsampled by voxels, but convolution neighborhoods are spheres. So to make a parallel with images, we want neighborhood_radius / subsampling_dl to be quite small, like 3x3 kernels and we want input_radius / subsampling_dl to be large like a full size image
- In segmentation, the potentials (center of input spheres) are found via the list pot_trees. For me, it seems that each KDTree in pot_trees is a subsampled version of its corresponding KDTree in input_tree. But each tree in input_tree is already generated from a subsampled cloud file. Is this double-subsampling for better computation time only or is there another reason I am missing?
Yes, it is for better computation time only. We can do this because input_radius is so much bigger than subsampling_dl, and we do not need to pick a sphere at every point location, we only need to pick spheres regularly in space with some overlap. The subsampling size of pot_trees will dictate the overlap that we have.
- In segmentation, although it is unlikey, it is theoretically possible that one batch contains more than one input sphere from the same original point cloud file. Is that correct?
It is more than likely actually, it happens a lot. In S3DIS there are only 5 training clouds and many batches contain more than 5 input spheres. Even if batches would contain less than 5 spheres, with the randomness it is very likely that one batch contains more than one input sphere from the same original point cloud file.
Hi Hugues,
thank you very much for your detailed response, I think I have most of my questions clear thanks to you now.
I am asking these detailed questions because we are trying to build a GAN-like model for Domain Transfer with your KPConv as the backbone. I think on your Tensorflow-Implementation Issue page, there is also an issue now from someone trying a GAN-Approach as well.
Considering your architecture, we thought about using KP-FCNN (segmentation). But since we want the entire point cloud scene to be domain-transfered, I am currently thinking about just putting the entire cloud as the input sphere (like for classification).
I suppose you have good reasons to use only subclouds (=part of one scene) for the segmentation. Are those considerations mainly based on computation time? So in our case, do we just have to wait longer, or were there some accuracy or other non-timing related issues when segmenting entire point clouds?
Or is there a possibility to draw subclouds from one point cloud file until the entire point cloud is segmented in testing, especially considering overlapping subcloud input-spheres?
Also, just to clarify, in the subcloud-based segmentation, you use reprojection. This essentially means you take the output of KPFCNN and find for each output point the closest point of the input-sphere and assignthe output feature to this found point. Is that correct?
Thank you very much again, I hope I dont bother you too much!
Best regards
I suppose you have good reasons to use only subclouds (=part of one scene) for the segmentation. Are those considerations mainly based on computation time? So in our case, do we just have to wait longer, or were there some accuracy or other non-timing related issues when segmenting entire point clouds?
The main issue is memory consumption. Big scenes like S3DIS cannot fit in the GPU memory.
Or is there a possibility to draw subclouds from one point cloud file until the entire point cloud is segmented in testing, especially considering overlapping subcloud input-spheres?
Yes you can do that. If you want to use overlapping subcloud input-spheres, you just have to activate the potentials in the dataset sampling to ensure every point in the scene has been tested. Or you can also change the sampling strategy: you would have to go into the dataset class and change the way you load input point clouds.
Also, just to clarify, in the subcloud-based segmentation, you use reprojection. This essentially means you take the output of KPFCNN and find for each output point the closest point of the input-sphere and assign the output feature to this found point. Is that correct?
Actually the opposite: find for each point of the input-sphere the closest output point. Otherwise some input points could end up not having any prediction.