VILA icon indicating copy to clipboard operation
VILA copied to clipboard

What is the conv_mode for VILA1.5-40b in video inference?

Open stdKonjac opened this issue 1 year ago • 2 comments
trafficstars

Hi, I wonder what is the conv_mode for VILA1.5-40b in video inference? Additionally, I noted that the <video> token seems invalid in video inference. The eval codes will automatically add several tokens while keeping the <video> token untouched. For example:

<image>
<image>
<image>
<video>
Please describe the video

Is this behavior normal? I'll be appreciated for your timely response :) @Lyken17

stdKonjac avatar Nov 03 '24 14:11 stdKonjac

hermes-2

Lyken17 avatar Nov 19 '24 14:11 Lyken17

Hi @stdKonjac! Similar question in #87

danigarciaoca avatar Dec 12 '24 18:12 danigarciaoca