mPLUG-Owl
mPLUG-Owl copied to clipboard
What is the process of video input? There is no preprocessing code in your code, but Hugging Face spaces supports video input.
Hi, the model on huggingface space is the advanced version of mPLUG-Owl which natively support video with temporal related module as input without treating video as multiple frames. The video is tokenized into 65 tokens as image. We will release it very soon.
Hi, the model on huggingface space is the advanced version of mPLUG-Owl which natively support video with temporal related module as input without treating video as multiple frames. The video is tokenized into 65 tokens as image. We will release it very soon.
But the paper doesn't mention this temporal-related module. Will this module be detailed in the new version of the paper coming soon?
Yes, specially we add the local temporal modeling module proposed mPLUG-2, and add a trajectory learnable queries in visual abstractor module. We will include this in the new version of the paper.
really need it !
really need it !
We will release the video version in this week!
Awesome! I cant't wait it ! Will you release both code and paper?
We would not update the paper but we will include the specification of the model's design in the video branch. The code and weight's will be released
really need it !
We will release the video version in this week!
Hi, did you release the video version ? I didn't see the updates in this project
really need it !
We will release the video version in this week!
Hi, did you release the video version ? I didn't see the updates in this project
Sorry for that. The code and weight is under reviewing process, we will release it once the review is done.
Hi, this is a wonderful work! I wonder that when will you release the video version ?