LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

BLIP2 Text Localization

Open RozDavid opened this issue 2 years ago • 5 comments
trafficstars

Hey all,

First of all thanks for the cool project and the shared checkpoints. I was wondering if there is any way to extract attention maps with respect to all query tokens using the QFormer module. Theoretically it should still have a similar cross-attention module that BLIP had within the text encoder's base model, but I cant't find a way to access this information with normal callbacks.

All help is appreciated! Thanks a lot, David

RozDavid avatar Apr 12 '23 10:04 RozDavid

@RozDavid Do you figure it out? I am also thinking about how to implement the text localization with BLIP2.

Reagan1311 avatar Apr 24 '23 14:04 Reagan1311

Not yet :/

RozDavid avatar Apr 25 '23 19:04 RozDavid

Hi @RozDavid @Reagan1311 , sorry to bother you. Are there now any solutions to extract corss attention maps from blip2? Thanks!

AntigoneRandy avatar Jan 09 '24 17:01 AntigoneRandy

Hi @RozDavid @Reagan1311 @AntigoneRandy, sorry to bother you. Do you know any solutions can get the results of text localization form Blip2? Thanks!

Yorkev avatar Mar 09 '24 13:03 Yorkev

Hi @RozDavid @Reagan1311 @AntigoneRandy @Yorkev, did you find any solutions to this? Thanks!

h-pal avatar Jul 29 '24 16:07 h-pal