vilbert_beta image features of conceptual caption

Could you release Conceptual Caption features? These features maybe so heavy to upload. But I really want to retrain based on your code.

By the way I have a question about your number of streams study. For your two-stream version, I found text stream used 12 bert layers and image stream used 6 image bert layers. These two streams would pass through a connection module with 6 layers. For the single stream version, two streams would share 12 bert layers for encoding. I don't think these two models are comparable.

Thanks a lot!

Sep 11 '19 08:09 hhyhhyhy

1: There is no way to share the features. It's about 2 TB, do you have any suggestions? you can download the image dataset, and extract by yourself.

2: Why do you think it's not comparable? What are the comparable ways? Thanks

Sep 21 '19 18:09 jiasenlu

Hi, could you please release the features from a small sampled set of Conceptual Caption images? It will be much helpful for us to check the correctness of the computed features. Thank you!

Dec 08 '19 11:12 yangapku