Video-LLaVA icon indicating copy to clipboard operation
Video-LLaVA copied to clipboard

Support multiple rounds of video conversations?

Open JiaweiZhao-git opened this issue 1 year ago • 4 comments

Great work! As video conversations in the instruction dataset have only one round in this version, if I want to train and test multiple rounds of video conversions, what should I do? Thanks!

JiaweiZhao-git avatar Jan 23 '24 03:01 JiaweiZhao-git

Simply just need to organize the multi-round conversation data in the format of llava_image_tune_.json. llava_image_tune_.json has examples of multi-round conversations in it, even though it is images.

For the dataset source you can use VideoChat.

LinB203 avatar Jan 23 '24 14:01 LinB203

Does this repo support inference and evaluation of multiple rounds of video conversations currently? Which file should I refer?

JiaweiZhao-git avatar Jan 25 '24 02:01 JiaweiZhao-git

Does this repo support inference and evaluation of multiple rounds of video conversations currently? Which file should I refer?

You can refer to this. But I'm not sure the second output of the model is useful.

LinB203 avatar Jan 25 '24 06:01 LinB203

Simply just need to organize the multi-round conversation data in the format of llava_image_tune_.json. llava_image_tune_.json has examples of multi-round conversations in it, even though it is images.

For the dataset source you can use VideoChat.

where can I get llava_image_tune_.json? this file is not contained in datasets

silence143 avatar Mar 26 '24 15:03 silence143