LLaVA
LLaVA copied to clipboard
[Question] Difference between 13b-v0 and 13b-v1-1
Question
Thanks for this wonderful work!
I have noticed that weights of "LLaVA-13b-delta-v1-1" is released in huggingface, compared to former version v0.
What's the difference of v1-1, compared to v0? Regarding to training data, model, traing process, or just training time?
Hi @dejianchen1989, thank you for your interest in our work. The difference is the base model switched from Vicuna-v0 to Vicuna-v1-1. Besides, the prompts are changed to a more standard one, and the tokenizer issue has been well-addressed in Vicuna v1.1.
Dear Liu,
If I want to reproduce LLaVA, which one am I recommended to use, Vicuna-v1.1 or Vicuna-V0? The repo supports both VicunaV0 and VicunaV1.1, right?
Also, which model are you using for your online demo?
Thank you.
Hi @RunsenXu, the repo supports both models. The latest online demo is using the V1.1 variant.
If you want to reproduce our results on the paper (e.g. ScienceQA), I would recommend using V0 first, as it is the model we used to train/evaluate the numbers on our paper.
If you want to mainly focus on improving the multimodal capability, you may start with V1.1, as it is generally following a standard format for the prompts, and has sometimes slightly better performance than V0.
Thanks.
I see. Thank you very much!
If you want to reproduce our results on the paper (e.g. ScienceQA), I would recommend using V0 first, as it is the model we used to train/evaluate the numbers on our paper.
@haotian-liu Thanks for this wonderful work! Can you show the acc metrics on ScienceQA when using V1.1? Cause when I use vicuna1.1, only get 86.06%.
vicuna-13b-v1.1 sqa acc: Total: 4241, Correct: 3650, Accuracy: 86.06%