GroupViT icon indicating copy to clipboard operation
GroupViT copied to clipboard

Not Comparable Results

Open SII-Ferenas opened this issue 3 years ago • 5 comments

Hi, I have trained GroupVIT under your default settings with RedCAP+GCC3M+GCC12M, and I ran the experiments 2 times, only achieving 26.8% mIoU and 27.9% mIoU. According to the previous discussions in the Closed Issue, I found that it may be caused by the training data size. So would you plz tell me the specific data size of RedCap, GCC3M, GCC12M, and YFC100M?

SII-Ferenas avatar Sep 30 '22 02:09 SII-Ferenas

Hi @Ferenas

May I have your global batch size? It is crucial to have large enough batch size to make contrastive learning work. In GroupViT, we use 4096.

The detailed size of each dataset could be found here

https://github.com/xvjiarui/GroupViT/blob/dbaf9e85a6903c948df05f292d0b2091e9852f94/configs/default.yml#L11-L30

xvjiarui avatar Sep 30 '22 05:09 xvjiarui

Thanks for your feedback! I set the batch size as the default and use 8 GPUs to train the network, so the global batch size is 2048, which I think may not be the problem. But I found that the number of shards is different, even smaller, from yours. For instance, in GCC3M you have 436 shards but I only have 331 shards. Does it work fine? I actually want to know what the specific data shards storage in your device. For instance, GCC12M is reported to have 331GB but you say "GCC12M is 1.8T" [https://github.com/NVlabs/GroupViT/issues/16#issuecomment-1122995696].

SII-Ferenas avatar Sep 30 '22 05:09 SII-Ferenas

Hi @Ferenas

It's due to saved with different image sized. But we are resizing to 224 during training, so it won't affect the result.

But it's a common issue that GCC12M is not complete since many urls become invalid. You may need to check how many images you downloaded.

xvjiarui avatar Sep 30 '22 06:09 xvjiarui

Thanks for your reply, and I have downloaded the datasets again, found this is mainly due to the size of the datasets. BTW, I have one question about the convert_coco_object.py, could you please explain a little about the clsID_to_trID? Why do the label ids here need to be remapped?

SII-Ferenas avatar Oct 08 '22 09:10 SII-Ferenas

Hi @Ferenas

It's to map the segmentation label into continuous integers.

xvjiarui avatar Oct 08 '22 14:10 xvjiarui