VILA
VILA copied to clipboard
What is the conv_mode for VILA1.5-40b in video inference?
trafficstars
Hi, I wonder what is the conv_mode for VILA1.5-40b in video inference?
Additionally, I noted that the <video> token seems invalid in video inference. The eval codes will automatically add several
<image>
<image>
<image>
<video>
Please describe the video
Is this behavior normal? I'll be appreciated for your timely response :) @Lyken17
hermes-2
Hi @stdKonjac! Similar question in #87