Ronglai Zuo
Ronglai Zuo
Thanks for your excellent work. I am curious if there are any instructions for fine-tuning video-llava on my own dataset?
My understanding is that the 3rd stage should be instruction tuning?
I found that the inference results on my customized dataset are different if I run the evaluation code for multiple times. I fix the batch size to 4, and the...
What an excellent work! Could you please share the GPU requirement (number and memory) for pretraining and instruction tuning? Thanks.
Dear authors, I am trying to evaluate the model. I can get hand ids, but how to get upper body vertex ids?