PointCNN icon indicating copy to clipboard operation
PointCNN copied to clipboard

Failing to understand a portion of xdconv

Open aniqueakhtar opened this issue 5 years ago • 6 comments

Hello,

Recently I was going through your xdconv section of the code and I think there's a bug but I fail to understand how it is still working. Let me try to explain where I think there should be an issue.

Let me e.g. run the code on these settings copied from scannet_x8_2048_fps.py:

xconv_param_name = ('K', 'D', 'P', 'C', 'links')
xconv_params = [dict(zip(xconv_param_name, xconv_param)) for xconv_param in
                [(8, 1, -1, 32 * x, []),
                 (12, 2, 768, 64 * x, []),
                 (16, 2, 384, 96 * x, []),
                 (16, 4, 128, 128 * x, [])]]

with_global = True

xdconv_param_name = ('K', 'D', 'pts_layer_idx', 'qrs_layer_idx')
xdconv_params = [dict(zip(xdconv_param_name, xdconv_param)) for xdconv_param in
                 [(16, 4, 3, 3),
                  (16, 2, 2, 2),
                  (12, 2, 2, 1),
                  (8, 2, 1, 0)]]

Now in the pointcnn.py file, I have an issue with this part of the code:

if hasattr(setting, 'xdconv_params'):
            for layer_idx, layer_param in enumerate(setting.xdconv_params):
                tag = 'xdconv_' + str(layer_idx + 1) + '_'
                K = layer_param['K']
                D = layer_param['D']
                pts_layer_idx = layer_param['pts_layer_idx']
                qrs_layer_idx = layer_param['qrs_layer_idx']

                pts = self.layer_pts[pts_layer_idx + 1]
                fts = self.layer_fts[pts_layer_idx + 1] if layer_idx == 0 else self.layer_fts[-1]
                qrs = self.layer_pts[qrs_layer_idx + 1]
                fts_qrs = self.layer_fts[qrs_layer_idx + 1]
                P = xconv_params[qrs_layer_idx]['P']
                C = xconv_params[qrs_layer_idx]['C']
                C_prev = xconv_params[pts_layer_idx]['C']
                C_pts_fts = C_prev // 4
                depth_multiplier = 1
                fts_xdconv = xconv(pts, fts, qrs, tag, N, K, D, P, C, C_pts_fts, is_training, with_X_transformation,
                                   depth_multiplier, sorting_method)
                fts_concat = tf.concat([fts_xdconv, fts_qrs], axis=-1, name=tag + 'fts_concat')
                fts_fuse = pf.dense(fts_concat, C, tag + 'fts_fuse', is_training)
                self.layer_pts.append(qrs)
                self.layer_fts.append(fts_fuse)

So in this part of the code, the issue I have is when the layer_idx == 1 When that would happen the sizes of some variables would be: pts = N x 384 x 3 fts = N x 128 x 3 qrs = N x 384 x 3

Now once we feed these into the xconv function, we will be able to calculate indices which would have size: indices = N x 384 x K x 2

But then we use two lines of gather_nd in the xconv function that makes no sense. These lines are: (1) nn_pts = tf.gather_nd(pts, indices, name=tag + 'nn_pts') # (N, P, K, 3) (2) nn_fts_from_prev = tf.gather_nd(fts, indices, name=tag + 'nn_fts_from_prev')

I understand how the (1) line would get us nn_pts of size N x 384 x K x 3 But I fail to understand, how the (2) line is giving us nn_fts_from_prev of size N x 384 x K x 3

I believe the (2) line should have prompted an error. Since it is not giving an error, it means there's a lapse on in my understanding of how it works. Since the indices have been calculated using pts and qrs, I believe the indices should range from 0-383. However, the fts size is only 128 and therefore it should give an out of bound error whenever an indice is called that is outside the size of fts. I believe one indice would look something like [0, 300]. Which would be the 300th point on batch 0. However, this indice should not work with fts (line (2)) because fts is not large enough.

If someone can take just a few minutes to look into this and explain to me why it is working, it would be really helpful.

Thank you.

aniqueakhtar avatar Apr 19 '19 04:04 aniqueakhtar

Did you solve the problem ? I also think it may be a bug due to an error in the scannet_x8_2048_fps.py and s3dis_x8_2048_fps.py files in the given xdconv_params for this layer :

xdconv_params = [dict(zip(xdconv_param_name, xdconv_param)) for xdconv_param in [(16, 4, 3, 3), (16, 2, 2, 2), (12, 2, 2, 1), (8, 2, 1, 0)]]

where the right config should be :

xdconv_params = [dict(zip(xdconv_param_name, xdconv_param)) for xdconv_param in [(16, 4, 3, 3), (16, 2, 3, 2), (12, 2, 2, 1), (8, 2, 1, 0)]]

in order to have 128 inputs points instead of 384 for this layer.

The error may not be printed since I found this note on the tensorflow documentation :

Note that on CPU, if an out of bound index is found, an error is returned. On GPU, if an out of bound index is found, a 0 is stored in the corresponding output value. https://www.tensorflow.org/api_docs/python/tf/gather_nd

If someone can confirm it or if I am wrong it would be very nice to have some explanations.

Thank you

TomDrnd avatar May 14 '19 11:05 TomDrnd

I was unable to fix the problem. I thought there was a problem in my understanding of the code. But now that you have mentioned that comment in the documentation, I think there is a bug in the scannet_x8_2048_fps.py and s3dis_x8_2048_fps.py files.

aniqueakhtar avatar May 14 '19 21:05 aniqueakhtar

Did you solve the problem ? I also think it may be a bug due to an error in the scannet_x8_2048_fps.py and s3dis_x8_2048_fps.py files in the given xdconv_params for this layer :

xdconv_params = [dict(zip(xdconv_param_name, xdconv_param)) for xdconv_param in [(16, 4, 3, 3), (16, 2, 2, 2), (12, 2, 2, 1), (8, 2, 1, 0)]]

where the right config should be :

xdconv_params = [dict(zip(xdconv_param_name, xdconv_param)) for xdconv_param in [(16, 4, 3, 3), (16, 2, 3, 2), (12, 2, 2, 1), (8, 2, 1, 0)]]

in order to have 128 inputs points instead of 384 for this layer.

The error may not be printed since I found this note on the tensorflow documentation :

Note that on CPU, if an out of bound index is found, an error is returned. On GPU, if an out of bound index is found, a 0 is stored in the corresponding output value. https://www.tensorflow.org/api_docs/python/tf/gather_nd

If someone can confirm it or if I am wrong it would be very nice to have some explanations.

Thank you

I have the same idea with you!

jxqhhh avatar Jul 13 '19 17:07 jxqhhh

@jxqhhh and @aniqueakhtar did you guys get any further?

dandanaus avatar May 19 '20 10:05 dandanaus

@dandanaus I don't remember exactly, but I believe I assumed there was a mistake in the scannet_x8_2048_fps.py and s3dis_x8_2048_fps.py files.

aniqueakhtar avatar May 19 '20 23:05 aniqueakhtar

@dandanaus I don't remember exactly, but I believe I assumed there was a mistake in the scannet_x8_2048_fps.py and s3dis_x8_2048_fps.py files.

@aniqueakhtar did you end up changing them?

dandanaus avatar May 20 '20 00:05 dandanaus