xmodaler
xmodaler copied to clipboard
Possible implementation in Google Colab.
I would need to generate a caption in a video, what would be the easiest way to use it? I don't have to train it, just use it to generate this caption. I haven't found a working example in the documentation. Could it possibly be possible to have a colab notebook ready to use?
The project is mainly focused on training. To generate a caption from a raw video, you can refer to configs/image_caption/transformer/clip_transformer_test_raw.yaml and write a new video loader (similar to MSCoCoRawDataset). The dataloader needs to parse the video into frames and extract the features of the frames.