MVSNet icon indicating copy to clipboard operation
MVSNet copied to clipboard

GRU tensor scaling

Open KevinCain opened this issue 6 years ago • 6 comments

In 'model.py', the tensor shape is set to 1/4 of the declared maximum height and width, e.g.:

feature_shape = [FLAGS.batch_size, FLAGS.max_h/4, FLAGS.max_w/4, 32]

When we change this, how can we match the dimensions of the other inputs? When I try to compute with sample scale at 0.5 (rather than 0.25) and match the lines above with FLAGS.max_h/2, FLAGS.max_w/2, I receive the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,368,512,32] vs. shape[1] = [1,736,1024,16]

Perhaps the reference image size clashes, is that possibly the source of this error?

KevinCain avatar Jun 16 '19 23:06 KevinCain

Over the past days I've tried to resolve the tensor shape conflict noted above, but can't seem to find where in the code the offending tensor is initialized. I've set:

tf.app.flags.DEFINE_float('sample_scale', 0.5, 
                            """Downsample scale for building cost volume (W and H).""")

and

feature_shape = [FLAGS.batch_size, FLAGS.max_h/2, FLAGS.max_w/2, 32]

Any hints are welcome and where else I need to look.

KevinCain avatar Jun 17 '19 16:06 KevinCain

Hi Kevin,

Have you solved the problem? It is noteworthy that the training cameras are different from the testing cameras. The training cameras are cropped/scaled in advance. So you may look into the data preprocessing parts.

Yao

YoYo000 avatar Jun 25 '19 02:06 YoYo000

Thanks, @YoYo000, I haven't solved this yet; your suggestion is quite salient; I noticed the difference in the preprocessing code.

If I'm interpreting correctly, the error I noted above in this thread comes from the reference view [0] being scaled differently than the test views [1..n], during testing.

I was wondering if there are hard-coded scaling factors for the reference view, which I might have missed. My next step will be to skip the scaling and cropping steps, instead providing images for testing that are already sized and fit for the network; only normalization will then be handled in the python pre-processing code.

I must be missing a transformation step somewhere. ;-)

KevinCain avatar Jun 25 '19 04:06 KevinCain

When using GRU+UNetDS2GN, if I modify the hard-coded quarter-scale values for testing in 'test.py', do I need to modify the matching values for training and re-train?

Here's the relevant line, where scaling terms set the feature shape to 1/4 of input image height and width:

feature_shape = [FLAGS.batch_size, FLAGS.max_h/4, FLAGS.max_w/4, 32]

As I noted above, to simplify debugging I am cropping and scaling the input images externally, and therefore omitting the MVSGenerator scaling and cropping. That works fine with the default '/4' divisor above, but not with other values such as '/2'. I don't know why, yet.

KevinCain avatar Jun 29 '19 23:06 KevinCain

@KevinCain I think I might know where is the problem... The function "UNetDS2GN" is hard coded that the feature map is downsized by 4 compare with the original input. So you need to modify "UNetDS2GN" and maybe also "RegNetUS0" in "mvsnet.py" so that the real network architecture:

  1. Downsizes the original image size by 2 and
  2. Generates the inferred depth map that fits the input GT depth map when calculating the loss.

YoYo000 avatar Jul 01 '19 07:07 YoYo000

Hi @KevinCain ,have you solved the problem. I modify sample_scale from 0.25 to 0.5 in test.py, for testing the same dataset. And 0.25 get reasonalbe depthmap , 0.5 got wrong depthmap.

zjd1988 avatar May 14 '20 09:05 zjd1988