LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

BLIP2 Text Localization

Open RozDavid opened this issue 1 year ago • 4 comments

Hey all,

First of all thanks for the cool project and the shared checkpoints. I was wondering if there is any way to extract attention maps with respect to all query tokens using the QFormer module. Theoretically it should still have a similar cross-attention module that BLIP had within the text encoder's base model, but I cant't find a way to access this information with normal callbacks.

All help is appreciated! Thanks a lot, David

RozDavid avatar Apr 12 '23 10:04 RozDavid