Audio-Visual-Video-Caption icon indicating copy to clipboard operation
Audio-Visual-Video-Caption copied to clipboard

Pytorch implementation of audio-visual fusion video captioning model

Results 6 Audio-Visual-Video-Caption issues
Sort by recently updated
recently updated
newest added

hi, could you tell the way to split the msr-vtt dataset? Many thanks!!

Hi, Why are the multilevel attentions being used during encoding? They are used only during decoding according to the paper about Multimodal attention..

Hi, the dataset isn't available in the links you mentioned before in a different issue. Kindly guide..

Hi, I would like to try your video captioning model on my own videos, could you please provide the pre-trained model?

n_layers=opt['num_layers'], rnn_cell=opt['rnn_type'], rnn_dropout_p=opt['rnn_dropout_p']).cuda() KeyError: 'rnn_type'

Thanks for your work, could you provide the S2VT model ?