CMC Code to reproduce NYU RGBD results / input pipeline

Hi, thanks for your repo. It would be nice if you could provide the code / the input pipeline which you used to run the NYU RGB-D experiments as well (similar to #4 ). To me it is not entirely clear, how you added the different modalities. Best,

Aug 09 '19 06:08 meyerjo

@HobbitLong As I don't know if there is any timeline on this issue and just to be sure to understand everything correctly to replicate the NYU RGB-D experiments the following steps would be required to replicate them wouldn't they?

Initialize the different models per modality

feat_l, feat_ab, feat_depth = model(inputs)

Create different contrast objects for the different pairs (e.g. in the core view scheme contrast_l_ab, contrast_l_depth) 2.1. This would enable something like

out_l, out_ab = contrast_l_and_ab(feat_l, feat_ab, index)
out_l_2, out_depth = contrast_l_and_depth(feat_l, feat_depth, index)

For each of the modalities one need to create a criterion object (twice for the core modality "L"). Which would lead to something like

l_loss_from_l_and_depth = criterion_l2(out_l_2)
depth_loss_from_l_and_depth = criterion_depth(out_l_2)

And then add the results from above to the general loss term.

Is this the correct way to go forward? How exactly did you do the patched based training on the "L" modality? Could you provide some hyper-parameters for that?

Thanks for the otherwise very nice repo.

Aug 16 '19 13:08 meyerjo

@meyerjo ,

To your question, we used "patch-based contrastive objective" for this task, please refer to section 3.5.2.

That being said, for each of the modality, we extract global feature as well as local feature. Then we contrast between (1)global feature from modality A to local feature from modality B and (2)global feature from modality B to local feature from modality A. The way we build this global-local contrastive loss is exactly the same way s DIM paper. The code for this loss can be found here.

The way of extracting global local feature can be found here.

And then we sum up those pair-wise loss as described in our paper.

Aug 18 '19 01:08 HobbitLong

@HobbitLong thanks for this additional info.

Just to confirm, the use of a patch-wise method here is only motivated by a small dataset?

I am also trying to find an explanation of the global/local feature concept in your CMC paper - is this a deviation from the method presented there?

Many thanks!

Mar 10 '20 10:03 jnyjxn

@jnyjxn ,

Yes, NYU dataset only has less than 2k images.

This is different from the ImageNet experiment, but has been described in the supplementary of the paper.

Mar 10 '20 17:03 HobbitLong