VILA icon indicating copy to clipboard operation
VILA copied to clipboard

How to train a video inference model using this framework?

Open HAOYON-666 opened this issue 10 months ago • 1 comments

I want to train a multimodal video understanding model. What should I do? I find the NVILA-15B model supports video inference.

HAOYON-666 avatar Jan 13 '25 06:01 HAOYON-666

@yukang2017 can share details about video data preparations.

Lyken17 avatar Feb 25 '25 09:02 Lyken17