FreeVA
FreeVA copied to clipboard
Looking forward to integrating more mllm, such as instructblip, minigpt4-v2
Yes, that's natural. I've already been experimenting with more MLLMs and will be releasing the results recently.
The current version seems to fail with llava 1.6
For LLaVA-1.6, it uses both base features (336x336 resolution) and higher resolution features. To perform inference similar to 1.5, you only need to use the base features to avoid introducing more tokens. I will update the code for LLaVA-1.6. In fact, I have completed experiments with LLaVA-1.6, InstructBLIP, and InternVL, and dense aggregation works well for all of them.
For LLaVA-1.6, it uses both base features (336x336 resolution) and higher resolution features. To perform inference similar to 1.5, you only need to use the base features to avoid introducing more tokens. I will update the code for LLaVA-1.6. In fact, I have completed experiments with LLaVA-1.6, InstructBLIP, and InternVL, and dense aggregation works well for all of them.
Good! Thank U!
I have just updated the code for LLaVA-1.6. Just one line. You can check it out :)
For LLaVA-1.6, it uses both base features (336x336 resolution) and higher resolution features. To perform inference similar to 1.5, you only need to use the base features to avoid introducing more tokens. I will update the code for LLaVA-1.6. In fact, I have completed experiments with LLaVA-1.6, InstructBLIP, and InternVL, and dense aggregation works well for all of them.
wow~ ⊙o⊙ Could you provide the corresponding experimental results?
Of course! I'm getting married next week, so I plan to update arXiv with these results in early June after that.