Mike
Mike
I want to only input text feature or video feature in UniVL. In this paper, it said that one transformer combines text representation **T** and video representation **V**. Could you...
In the paper, you mention that you use this to do image captioning in table 2. However, I do not see the image captioning in this github. Can you tell...
Hi, I am very happy to see the pre-trained model in huggingface. I have a little question about AMRBART(AMR2Text) what is the input for this? does that mean we still...
In the second section, I do not have a new folder(dataset_ytsum) in Dataset and I do not have summary.pkl after executing model_visual-extractor.ipynb. Could you tell me where can I get...