Results 4 issues of Mike

I want to only input text feature or video feature in UniVL. In this paper, it said that one transformer combines text representation **T** and video representation **V**. Could you...

In the paper, you mention that you use this to do image captioning in table 2. However, I do not see the image captioning in this github. Can you tell...

Hi, I am very happy to see the pre-trained model in huggingface. I have a little question about AMRBART(AMR2Text) what is the input for this? does that mean we still...

In the second section, I do not have a new folder(dataset_ytsum) in Dataset and I do not have summary.pkl after executing model_visual-extractor.ipynb. Could you tell me where can I get...