ViT-Lens InstructBLIP and SEED Implementation

InstructBLIP and SEED Implementation

Open MichaelMaiii opened this issue 11 months ago • 2 comments

Hi, I have checked the Clip-Vision embedding (last hidden state) of Blip2&InstructBlip on huggingface (instructblip-vicuna-7b), the dimension is 257x1408. However, the multi-modal matching space of ViT-Lens uses 1x768 dimension. I wonder how to use InstructBlip and Seed for text and image generation directly, have they been fine-tuned?

Mar 10 '24 18:03 MichaelMaiii

ViT-Lens ViT-Lens copied to clipboard

InstructBLIP and SEED Implementation

ViT-Lens
ViT-Lens copied to clipboard