can OFA support video language tasks such as video-caption?

Open dinglei8908 opened this issue 3 years ago • 1 comments

suppose we can extract several frames from video, any suggestions about this?

Oct 12 '22 07:10 dinglei8908

Not done yet, but possible. We still need to figure out if we need to make changes on pretraining, or simply adapt the pretrained models to this task. The simplest way might be treating the average of frames as an image.

Oct 14 '22 09:10 JustinLin610