LAVIS
LAVIS copied to clipboard

Published 20 hours ago •

→

Metadata

LAVIS - A One-stop Library for Language-Vision Intelligence

Reame
Issues

Results 282 LAVIS issues

Sort by recently updated

Adding dataset for VQA task

1

comment

Hi, how can I add the visual7w dataset for the VQA task? The adding datasets documentation is for AVSD task and I'm not sure how to do similar steps but...

Error when running text localization example

1

comment

The issue is about the [text localization example](https://github.com/salesforce/LAVIS/blob/main/examples/blip_text_localization.ipynb). The input image is "../docs/_static/merlion.png" while the input caption is changed to "Merlion near marina bay. It is a city in Singapore....

updating the required torch version

Hi, thanks for the great stuff! Is there any plan to update the torch version (from `==1.10` to anything newer), or relax it?

Finetune CLIP on COCO and Flickr30K

1

comment

Hello, thanks for your nice work! Are there scripts and configuration files that can be used to finetune CLIP on COCO and Flickr30K, like BLIP ([retrieval_coco_ft.yaml](https://github.com/salesforce/LAVIS/blob/main/lavis/projects/blip/train/retrieval_coco_ft.yaml) and [train_retrieval_coco](https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh))? Thanks again!

`albef_vqa.rank_answers` returns single answer

5

comment

I've ran following piece of code ```python import torch from lavis.models import load_model, load_model_and_preprocess from PIL import Image device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # load sample image raw_image...

nick-konovalchuk

How to interpret NLVR model outputs and input labels

2

comment

As I understand from looking at blip paper, NLVR takes pair of images, a sentence for them and predicts if the sentence describes the image pair. I have used the...

What are the differences between `base_coco` and `large_coco` model types for `blip_caption` ?

1

comment

Enable tensorboard visualization

1

comment

Hi, thank you for the great work! I wonder if there is any plan to incorporate the tensorboard visualization. Also if there is any plan to combine `pytorch_lightning`?

Error when running with transformers 4.22.2

2

comment

When I run the BLIP captioning model with transformers 4.22.2 I get the following error: ``` py import torch from lavis.models import load_model_and_preprocess device = torch.device("cuda" if torch.cuda.is_available() else "cpu")...

Confidence of the BLIP captioning model

Hi, Is there a way to access the confidence of the generated caption?

← Metadata

8.9k

Stars

887

Forks

69

Watchers

Owner

Metadata

LAVIS - A One-stop Library for Language-Vision Intelligence