mPLUG-Owl What is the process of video input? There is no preprocessing code in your code, but Hugging Face spaces supports video input.

What is the process of video input? There is no preprocessing code in your code, but Hugging Face spaces supports video input.

Open sdjhshbswp opened this issue 1 year ago • 10 comments

May 27 '23 07:05 sdjhshbswp

Hi, the model on huggingface space is the advanced version of mPLUG-Owl which natively support video with temporal related module as input without treating video as multiple frames. The video is tokenized into 65 tokens as image. We will release it very soon.

May 28 '23 08:05 MAGAer13

Hi, the model on huggingface space is the advanced version of mPLUG-Owl which natively support video with temporal related module as input without treating video as multiple frames. The video is tokenized into 65 tokens as image. We will release it very soon.

But the paper doesn't mention this temporal-related module. Will this module be detailed in the new version of the paper coming soon?

May 28 '23 09:05 sdjhshbswp

Yes, specially we add the local temporal modeling module proposed mPLUG-2, and add a trajectory learnable queries in visual abstractor module. We will include this in the new version of the paper.

May 28 '23 09:05 MAGAer13

really need it !

Jun 14 '23 03:06 feymanwang

really need it !

We will release the video version in this week!

Jun 14 '23 03:06 MAGAer13

Awesome! I cant't wait it ! Will you release both code and paper?

Jun 14 '23 07:06 feymanwang

We would not update the paper but we will include the specification of the model's design in the video branch. The code and weight's will be released

Jun 14 '23 07:06 MAGAer13

really need it !

We will release the video version in this week!

Hi, did you release the video version ? I didn't see the updates in this project

Jun 19 '23 06:06 feymanwang

really need it !

We will release the video version in this week!

Hi, did you release the video version ? I didn't see the updates in this project

Sorry for that. The code and weight is under reviewing process, we will release it once the review is done.

Jun 19 '23 06:06 MAGAer13

Hi, this is a wonderful work! I wonder that when will you release the video version ?

Jun 24 '23 03:06 LinB203

mPLUG-Owl mPLUG-Owl copied to clipboard

What is the process of video input? There is no preprocessing code in your code, but Hugging Face spaces supports video input.

mPLUG-Owl
mPLUG-Owl copied to clipboard