Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching Why not ResNet

Why not ResNet

Open FangmingZhou opened this issue 5 years ago • 4 comments

Notice that the results in paper 'Deep Cross-Modal Pojection Learning for Image-Text Matching' are:{top- 1 = 49.37%，top-10 = 79.27%} while the results in this project are {top- 1 = 42.999%,top-10 = 67.869%}, which are resulted from the model that is based on MobileNet. So, why not provide a new version that is based on ResNet! ^^ It will be greatly helpful for our beginners ! Thanks a lot !

Dec 17 '19 10:12 FangmingZhou

Hello, is your results based on CUHK-PEDES?

Mar 04 '20 07:03 wxh001qq

Hello, is your results based on CUHK-PEDES?

yes

Mar 06 '20 09:03 FangmingZhou

Hello, is your results based on CUHK-PEDES?

yes

I found whether to use nn.DataParallel() will extremely influence the result. if not use nn.DataParallel() got about {top- 1 = 31%，top-10 = 55%} while use got about {top- 1 = 42%，top-10 = 67%}

Mar 06 '20 14:03 wxh001qq

Hello, is your results based on CUHK-PEDES?

yes

I found whether to use nn.DataParallel() will extremely influence the result. if not use nn.DataParallel() got about {top- 1 = 31%，top-10 = 55%} while use got about {top- 1 = 42%，top-10 = 67%}

我没有用过并行的这个方法，网上好像也没有提到这会导致结果不同的？可能你要请教一下别人了

Mar 07 '20 07:03 FangmingZhou

Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching copied to clipboard

Why not ResNet

Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching
Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching copied to clipboard