VILA
VILA copied to clipboard
How to train a video inference model using this framework?
I want to train a multimodal video understanding model. What should I do? I find the NVILA-15B model supports video inference.
@yukang2017 can share details about video data preparations.