LAVIS issues

Results 299 LAVIS issues

Sort by recently updated

Does it support multi image input？

Do you currently support multi image input？

size mismatch for opt_proj.weight: copying a param with shape torch.Size([2560, 768]) from checkpoint, the shape in current model is torch.Size([768, 768]).

![image](https://github.com/salesforce/LAVIS/assets/141383792/e67a04b3-290b-4e4a-8661-f9259f37a3e4) How should this problem be solved? Thank you!

shams2023

How to fine-tune BLIP-2 on a local Chinese dataset?

I want to provide an image to BLIP-2, and in return, it should generate a Chinese description. Can anyone guide me on how to do it?

BaochaoZhu

Unable to reproduce InstuctBLIP Vicuna GQA and TextVQA Results

Hello, I am trying to reproduce the InstructBLIP paper's results on GQA and TextVQA. Using both the HuggingFace and the LAVIS versions of the models, I am consistently getting 5-10%...

suraj-nair-tri

question about text localization

Is there any way to get the result of text localization in Figure 2 of 'LAVIS: A One-stop Library for Language-Vision Intelligence ![github](https://github.com/salesforce/LAVIS/assets/128226689/cd9a6a3c-3c40-478d-b010-e8a186d7d758)

Yorkev

Compatibility Issue with Different Versions of Transformers

Hello LAVIS team, I've encountered an issue when trying to import models from the model zoo using different versions of the transformers library. Specifically, I've tried using transformers version 4.33.2...

aichy98

Error occurred during BLIP2-demo execution

Hi, everyone. I encountered the following errors during the execution of BLIP2-demo of huggingface. I executed the following code. ``` import os os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' os.environ['CUDA_VISIBLE_DEVICES'] = "3" from PIL...

Kim-DKyu

Zero-shot Performance of InstructBLIP on OK-VQA Dataset

Dear authors, Thanks for the great work! I wonder to know the zero-shot performance of InstructBLIP on OK-VQA Dataset. However, it's not report in the paper. I reproduced this and...

Upper9527

ocr training/evaluation of instructblip

Dear Maintainers, I'm currently trying to reproduce the zero-shot results of instructblip. The caption of table5 says that for datasets with OCR tokens, the image query embeddings are simply appended...

gyhdog99

Where to find the Evaluation Scripts for InstructBLIP paper's Table 1？

Hello, I wonder where can I find all of the evaluation scripts (or partial) to reproduce Table 1 of the InstructBLIP paper. I tried to reproduce the evaluation results for...

weiyueli7

LAVIS
LAVIS copied to clipboard

Metadata

Does it support multi image input？

size mismatch for opt_proj.weight: copying a param with shape torch.Size([2560, 768]) from checkpoint, the shape in current model is torch.Size([768, 768]).

How to fine-tune BLIP-2 on a local Chinese dataset?

Unable to reproduce InstuctBLIP Vicuna GQA and TextVQA Results

question about text localization

Compatibility Issue with Different Versions of Transformers

Error occurred during BLIP2-demo execution

Zero-shot Performance of InstructBLIP on OK-VQA Dataset

ocr training/evaluation of instructblip

Where to find the Evaluation Scripts for InstructBLIP paper's Table 1？

← Metadata

Owner

Metadata

LAVIS LAVIS copied to clipboard

Metadata

← Metadata

Owner

Metadata

LAVIS
LAVIS copied to clipboard