LAVIS
LAVIS copied to clipboard
Blip feature extractor API
Thanks for the great work. I have some questions about the BLIP feature extractor interface.
- In the example code, you wrote
# torch.Size([1, 12, 768]), use features_multimodal[:,0,:] for multimodal classification tasks
What are the other channels [:, 1:12, :]
useful for?
- In the example code of the API, there is another attribute called
image_features
(link), but they are not available. Can you comment on the difference betweenimage_embeds
andimage_features
, and how to access the latter?
print(features_image.image_embeds.shape)
print(features_image.image_features.shape)
Thanks!