pointnet.pytorch
pointnet.pytorch copied to clipboard
Conv1d kernel size in transformer nets
Comparing your code to the official TensorFlow implementation I believe the kernel size ought to be 3 for the conv1 in the transformer network code (starting here).
The official implementation convolves 64 1x3 filters over each of N 1x3 points. The result is 64 scalar values describing each point (i.e. a N x 64 matrix). Your code uses a filter size of 1.
Perhaps you could clarify?
Also, as a micro-optimization, instead of performing 2 transpose operations, you could always just left-multiply by the transform and then you need no transpose operations. The network just ends up learning the inverse transform than we envision (i.e. the transpose).
@meder411 I think the difference between pytorch code and the official tf code is due to the dimension arrangement difference between pytorch and tensorflow. In pytorch, N, C, H, W, however in tf is N, H, W, C. However, I have another question about conv layers. In pytorch, Linear(Fc) layer can have arbitrary intermediate dimensions, so, why don't we just use fc layers to implement pointnet but the auther still use conv layers to implement it, although they leads to the same result.
@zeal-github ,in my opinion,using fc layers means input is 2 dimensions.But there is no difference between fc layers and conv layers with 1x1 kernel indeed except dimensions.
@meder411 I think the difference between pytorch code and the official tf code is due to the dimension arrangement difference between pytorch and tensorflow. In pytorch, N, C, H, W, however in tf is N, H, W, C. However, I have another question about conv layers. In pytorch, Linear(Fc) layer can have arbitrary intermediate dimensions, so, why don't we just use fc layers to implement pointnet but the auther still use conv layers to implement it, although they leads to the same result.
I think you forget one important setting from the figure of network architecture. Here the weights are shared among perceptrons. So in this case using nn.Conv1d
is a better option compared to nn.Linear()
.
Why is it Conv1D. As far as I can see in the official tensorflow implementation, the author has used only Conv2D layers, am I missing something? Could someone point out where the author has used 1D convolutions? Thanks!
@meder411 , have you figured out why? I am also confused about this. The TF implementation 2D Conv with 1x3 kernel size and 64 output channels while the PyTorch implementation is 1D Conv with 1x1 kernel size and 64 output channels.
Hi @fxia22,
would you like to help us out, like @ShrutheeshIR and @timothylimyl , I am also wondering why Conv1d is used when the original code uses Conv2d to perform the convolutions.
@ShrutheeshIR @dhirajsuvarna ,
I think it ends up to be the same. I checked the summary for the parameters from 1D Conv (Pytorch) and you can see that the first 1D conv has 64*3 + 64 parameters which are the same as 2D Conv from TF.
Hello @timothylimyl Thanks. Yes, I noticed the tensorflow implementation, and it seems like they add a dimension to the input, to apply the 2D convolution, which is equivalent to the 1D convolution here without the added dimension.
As much as I know, the tensorflow implementation uses conv2d in order to leverage the optimization provided by cudnn, which is not available for conv1d.