VILA How to train a video inference model using this framework?

How to train a video inference model using this framework?

Open HAOYON-666 opened this issue 10 months ago • 1 comments

I want to train a multimodal video understanding model. What should I do? I find the NVILA-15B model supports video inference.

Jan 13 '25 06:01 HAOYON-666

@yukang2017 can share details about video data preparations.

Feb 25 '25 09:02 Lyken17