image features of conceptual caption
Could you release Conceptual Caption features? These features maybe so heavy to upload. But I really want to retrain based on your code.
By the way I have a question about your number of streams study. For your two-stream version, I found text stream used 12 bert layers and image stream used 6 image bert layers. These two streams would pass through a connection module with 6 layers. For the single stream version, two streams would share 12 bert layers for encoding. I don't think these two models are comparable.
Thanks a lot!
1: There is no way to share the features. It's about 2 TB, do you have any suggestions? you can download the image dataset, and extract by yourself.
2: Why do you think it's not comparable? What are the comparable ways? Thanks
Hi, could you please release the features from a small sampled set of Conceptual Caption images? It will be much helpful for us to check the correctness of the computed features. Thank you!