Video-Cap
Video-Cap copied to clipboard
Where can I get the train data?
How can I train the model by myself?
There are lots of video description datasets on the websites. Such as https://www.microsoft.com/en-us/research/publication/msr-vtt-large-video-description-dataset-bridging-video-language-supplementary-material/.