LAVIS
LAVIS copied to clipboard
LAVIS - A One-stop Library for Language-Vision Intelligence
Hi, I used BlipForConditionalGeneration from transformers for image captioning. I want to visualize the reason of generated caption (word by word) like GradCAM. I found a code from Albef (https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb),...
Thank you so much for the code! It is pretty useful! Could you please also open source the retrieval training based on BLIP2? Any help is greatly appreciated.
Hello, I appreciate the work you've done. I would like to ask you a question about how to interpret the image text retrieval score. I received a score like this:...
In my understanding, VQA is similar with the ability of zero-shot image-to-text generation mentioned in the BLIP2 paper. They all give the answer about prompt(question / natural language instructions) conditioned...
Did 仮�� lots Sep beside „香�陈langle lots curios Profilelangle lots Sep beside „香�陈langle lots curios Profilelangle lots Sep beside „香�陈langle lots curios Profilelangle lots Sep beside „获ensuremath
Thank you very much for your open source contribution, the performance of the model is amazing. If I want to obtain image features from the intermediate layers of the backbone,...
hi, thanks for open source, great work. Do you have examples to pre-train blip2 using my own data ?
Do you have a train config for blip2 vicuna instruct? Currently, using a vqa dataset with "blip_question" text processors and a vqa task, I encounter an error at this line...
Are these model supported by nvidia `Quadro RTX 5000`?