LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

LAVIS - A One-stop Library for Language-Vision Intelligence

Results 282 LAVIS issues
Sort by recently updated
recently updated
newest added

I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right? Not implemented `load_pretrained: False`...

I tried to train with clip_L vision encoder, (by adding vit_model: "clip_L" to model train config), but it seems the QFormer checkpoint loaded by default at this line (https://github.com/salesforce/LAVIS/blob/main/lavis/models/base_model.py#L100) is...

I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue. ``` ╭─────────────────────────────── Traceback (most...

Hi, Thank you for your great work BLIP2. I find there is no zeroshot VQA evaluation code for BLIP2-OPT, so I create one, refering to the code of FLAN-T5. However,...

…mismatch in later XMLBert decoder. fix issue #241

cla:signed

Can BLIP2 work with small batch size like 64,128,256?

Is it possible to do this without going from GPU -> CPU -> GPU? Code: ```py vis_images = [self.vis_processors["eval"](Image.fromarray(image.cpu().numpy())).unsqueeze(0).to('cuda').squeeze(0) for image in images] features_image = self.blip2_model.extract_features({"image": torch.stack(vis_images)}, mode="image") ``` extract_features...

When I set use_dist_eval_sampler=True in retrieval evaluation, the result is very bad. But when I set use_dist_eval_sampler False, the result is very good

I train a finetune model use command: `python train.py --cfg-path lavis/projects/blip2/train/pretrain_stage2.yaml` my env is ![image](https://user-images.githubusercontent.com/4124006/229704320-75e5e6ef-9c30-4a23-8261-ece7a2fc7638.png) but when i use finetuned model to generate caption, the error happend `RuntimeError: Sizes of...

I tried to run the code in huggingface (https://huggingface.co/Salesforce/blip2-opt-2.7b ) ```python import requests from PIL import Image from transformers import BlipProcessor, Blip2ForConditionalGeneration processor = BlipProcessor.from_pretrained("Salesforce/blip2-opt-2.7b") ``` However, I meet this...