LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

LAVIS - A One-stop Library for Language-Vision Intelligence

Results 299 LAVIS issues
Sort by recently updated
recently updated
newest added

Do you currently support multi image input?

![image](https://github.com/salesforce/LAVIS/assets/141383792/e67a04b3-290b-4e4a-8661-f9259f37a3e4) How should this problem be solved? Thank you!

I want to provide an image to BLIP-2, and in return, it should generate a Chinese description. Can anyone guide me on how to do it?

Hello, I am trying to reproduce the InstructBLIP paper's results on GQA and TextVQA. Using both the HuggingFace and the LAVIS versions of the models, I am consistently getting 5-10%...

Is there any way to get the result of text localization in Figure 2 of 'LAVIS: A One-stop Library for Language-Vision Intelligence ![github](https://github.com/salesforce/LAVIS/assets/128226689/cd9a6a3c-3c40-478d-b010-e8a186d7d758)

Hello LAVIS team, I've encountered an issue when trying to import models from the model zoo using different versions of the transformers library. Specifically, I've tried using transformers version 4.33.2...

Hi, everyone. I encountered the following errors during the execution of BLIP2-demo of huggingface. I executed the following code. ``` import os os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' os.environ['CUDA_VISIBLE_DEVICES'] = "3" from PIL...

Dear authors, Thanks for the great work! I wonder to know the zero-shot performance of InstructBLIP on OK-VQA Dataset. However, it's not report in the paper. I reproduced this and...

Dear Maintainers, I'm currently trying to reproduce the zero-shot results of instructblip. The caption of table5 says that for datasets with OCR tokens, the image query embeddings are simply appended...

Hello, I wonder where can I find all of the evaluation scripts (or partial) to reproduce Table 1 of the InstructBLIP paper. I tried to reproduce the evaluation results for...