VILA What is the conv_mode for VILA1.5-40b in video inference?

What is the conv_mode for VILA1.5-40b in video inference?

Open stdKonjac opened this issue 1 year ago • 2 comments

trafficstars

Hi, I wonder what is the conv_mode for VILA1.5-40b in video inference? Additionally, I noted that the <video> token seems invalid in video inference. The eval codes will automatically add several tokens while keeping the <video> token untouched. For example:

<image>
<image>
<image>
<video>
Please describe the video

Is this behavior normal? I'll be appreciated for your timely response :) @Lyken17

Nov 03 '24 14:11 stdKonjac

hermes-2

Nov 19 '24 14:11 Lyken17

Hi @stdKonjac! Similar question in #87

Dec 12 '24 18:12 danigarciaoca

VILA VILA copied to clipboard

What is the conv_mode for VILA1.5-40b in video inference?

VILA
VILA copied to clipboard