LLaVA [Question] How can I get an attention map from LLaVA 1.5?

[Question] How can I get an attention map from LLaVA 1.5?

Open Lala-chick opened this issue 1 year ago • 2 comments

Question

Hello. Thank you for sharing such an impressive model. While using LLaVA, I would like to see where the model is focusing on the image based on the prompt. Can you provide assistance?

Feb 05 '24 12:02 Lala-chick

I was wondering the same question today, but on LLaVA-1.6 👀

Feb 05 '24 23:02 SylvJalb

same question

Feb 19 '24 00:02 CN-Steve-Lee

same question!

Mar 21 '24 16:03 ahnchive

same quesiton, any solusions?

Mar 28 '24 07:03 GasolSun36

same question!

Apr 03 '24 07:04 jdsannchao

I’ve been looking into this for a while now. It definitely seems to be possible. See: https://arxiv.org/html/2404.01331v1#S4.F2 for an example :)

They cite this paper which is extremely insightful. They have code examples that apply to raw CLIP models. I’m assuming it’s possible to use this technique for LLaVA based models as well.

I’ll be doing some more digging but if anyone else has figured this out reach out!

Apr 08 '24 05:04 AlecDusheck

take a look at this repo here: https://github.com/zjysteven/VLM-Visualizer

Jul 09 '24 15:07 sherzod-hakimov

LLaVA LLaVA copied to clipboard

[Question] How can I get an attention map from LLaVA 1.5?

Question

LLaVA
LLaVA copied to clipboard