LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

[Question] How can I get an attention map from LLaVA 1.5?

Open Lala-chick opened this issue 1 year ago • 2 comments

Question

Hello. Thank you for sharing such an impressive model. While using LLaVA, I would like to see where the model is focusing on the image based on the prompt. Can you provide assistance?

Lala-chick avatar Feb 05 '24 12:02 Lala-chick

I was wondering the same question today, but on LLaVA-1.6 👀

SylvJalb avatar Feb 05 '24 23:02 SylvJalb

same question

CN-Steve-Lee avatar Feb 19 '24 00:02 CN-Steve-Lee

same question!

ahnchive avatar Mar 21 '24 16:03 ahnchive

same quesiton, any solusions?

GasolSun36 avatar Mar 28 '24 07:03 GasolSun36

same question!

jdsannchao avatar Apr 03 '24 07:04 jdsannchao

I’ve been looking into this for a while now. It definitely seems to be possible. See: https://arxiv.org/html/2404.01331v1#S4.F2 for an example :)

They cite this paper which is extremely insightful. They have code examples that apply to raw CLIP models. I’m assuming it’s possible to use this technique for LLaVA based models as well.

I’ll be doing some more digging but if anyone else has figured this out reach out!

AlecDusheck avatar Apr 08 '24 05:04 AlecDusheck

take a look at this repo here: https://github.com/zjysteven/VLM-Visualizer

sherzod-hakimov avatar Jul 09 '24 15:07 sherzod-hakimov