KPConv-PyTorch icon indicating copy to clipboard operation
KPConv-PyTorch copied to clipboard

Question Regarding KPConv Trainable Parameters

Open yuvalH9 opened this issue 3 years ago • 1 comments

Hi, First of all I must say that I enjoyed reading your paper, and the supplied GitHub is very useful - thanks for that!

I have a question regarding the KPConv block. When I use .Parameters() on a KPConv block I get 2 trainable parameters: the first one is the tensor of the weights (with dimensions K x D_out x D_in) and anthoer parameter of size K x 3.

When I deep down to the code I saw that you define in kernel point init function: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/1cd3742fc57e5ba9d21f7909c78a05a971781351/models/blocks.py#L222

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/1cd3742fc57e5ba9d21f7909c78a05a971781351/models/blocks.py#L234

But with requires_grad=False.

My question is why you define the kernel points as a network parameter if you do not train them? Or they are indeed trainable in a way and I missed something.

Thanks Yuval

yuvalH9 avatar Jun 22 '21 19:06 yuvalH9

Hi @yuvalH9,

Thx for your interest in my work. This is indeed a good question, I defined them as parameters because I felt that this is what they are. Even though we do not train them, the original kernel point positions are part of what defines the network. You need them to be saved along with the weight in the network checkpoints, so having them as parameters makes sense.

We could have decided that every KPConv always has the same kernel disposition aligned in the same direction, and therefore we would not have needed to save them. But instead, we chose to randomly rotate the kernel points every time. This ensures that if the kernel disposition has a bias or a dissymmetry, it will be different for every layer and thus compensated.

A final note about the requires_grad=False: a long time ago, I actually tried to train these positions as you suggest. This was before thinking about the deformable KPConv, in the beginning. But this would not give noticeable improvement, because you are deforming the kernel globally, which does not change things too much. I then thought of having multiple possible point dispositions (a parameter of size N * K * 3), but this was not really a good way to implement things (multiplication of the computation time and memory). Eventually, all these reflections lead to the deformable idea, which is the best way to achieve a trainable kernel disposition in my opinion.

HuguesTHOMAS avatar Jun 23 '21 13:06 HuguesTHOMAS