Ask-Anything icon indicating copy to clipboard operation
Ask-Anything copied to clipboard

question about vision encoder

Open King-king424 opened this issue 1 year ago • 2 comments

image Is the vision encoder used here umt-l or internvideo2-1B? I saw that the mistral version in internvideo2 had similar results to the one here

King-king424 avatar Jun 17 '24 11:06 King-king424

Hi! We released UMT-L since it runs faster.

Andy1621 avatar Jun 17 '24 13:06 Andy1621

Hi! We released UMT-L since it runs faster.

Hello! Will you release internvideo2 version in there future? thanks~

yepzhang avatar Sep 20 '24 08:09 yepzhang

Hi! We released UMT-L since it runs faster.

Hello! Will you release internvideo2 version in there future? thanks~

@yepzhang Sure, You can found the internvideo2-chat in huggingface

yinanhe avatar Oct 11 '24 07:10 yinanhe