LAVIS
LAVIS copied to clipboard
BLIP2 Text Localization
Hey all,
First of all thanks for the cool project and the shared checkpoints. I was wondering if there is any way to extract attention maps with respect to all query tokens using the QFormer module. Theoretically it should still have a similar cross-attention module that BLIP had within the text encoder's base model, but I cant't find a way to access this information with normal callbacks.
All help is appreciated! Thanks a lot, David
@RozDavid Do you figure it out? I am also thinking about how to implement the text localization with BLIP2.
Not yet :/
Hi @RozDavid @Reagan1311 , sorry to bother you. Are there now any solutions to extract corss attention maps from blip2? Thanks!
Hi @RozDavid @Reagan1311 @AntigoneRandy, sorry to bother you. Do you know any solutions can get the results of text localization form Blip2? Thanks!
Hi @RozDavid @Reagan1311 @AntigoneRandy @Yorkev, did you find any solutions to this? Thanks!