LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

Blip feature extractor API

Open jhwang7628 opened this issue 1 year ago • 3 comments

Thanks for the great work. I have some questions about the BLIP feature extractor interface.

  1. In the example code, you wrote
# torch.Size([1, 12, 768]), use features_multimodal[:,0,:] for multimodal classification tasks

What are the other channels [:, 1:12, :] useful for?

  1. In the example code of the API, there is another attribute called image_features (link), but they are not available. Can you comment on the difference between image_embeds and image_features, and how to access the latter?
print(features_image.image_embeds.shape)
print(features_image.image_features.shape)

Thanks!

jhwang7628 avatar Nov 08 '22 01:11 jhwang7628