LWM
LWM copied to clipboard
vison-language model training data example for videos
Hi, thank you for your cool work!
I've read data.md but still couldn't understand how to make a training dataset for training the vision-language model using videos.
Could anyone kindly share an example format of the training dataset?
Thanks