FreeVA icon indicating copy to clipboard operation
FreeVA copied to clipboard

Looking forward to integrating more mllm, such as instructblip, minigpt4-v2

Open King-king424 opened this issue 1 year ago • 2 comments

King-king424 avatar May 22 '24 11:05 King-king424

Yes, that's natural. I've already been experimenting with more MLLMs and will be releasing the results recently.

whwu95 avatar May 22 '24 16:05 whwu95

The current version seems to fail with llava 1.6

DemoGit4LIANG avatar May 24 '24 03:05 DemoGit4LIANG

For LLaVA-1.6, it uses both base features (336x336 resolution) and higher resolution features. To perform inference similar to 1.5, you only need to use the base features to avoid introducing more tokens. I will update the code for LLaVA-1.6. In fact, I have completed experiments with LLaVA-1.6, InstructBLIP, and InternVL, and dense aggregation works well for all of them.

whwu95 avatar May 31 '24 06:05 whwu95

For LLaVA-1.6, it uses both base features (336x336 resolution) and higher resolution features. To perform inference similar to 1.5, you only need to use the base features to avoid introducing more tokens. I will update the code for LLaVA-1.6. In fact, I have completed experiments with LLaVA-1.6, InstructBLIP, and InternVL, and dense aggregation works well for all of them.

Good! Thank U!

DemoGit4LIANG avatar May 31 '24 06:05 DemoGit4LIANG

I have just updated the code for LLaVA-1.6. Just one line. You can check it out :)

whwu95 avatar May 31 '24 06:05 whwu95

For LLaVA-1.6, it uses both base features (336x336 resolution) and higher resolution features. To perform inference similar to 1.5, you only need to use the base features to avoid introducing more tokens. I will update the code for LLaVA-1.6. In fact, I have completed experiments with LLaVA-1.6, InstructBLIP, and InternVL, and dense aggregation works well for all of them.

wow~ ⊙o⊙ Could you provide the corresponding experimental results?

King-king424 avatar May 31 '24 06:05 King-king424

Of course! I'm getting married next week, so I plan to update arXiv with these results in early June after that.

whwu95 avatar May 31 '24 06:05 whwu95