LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

LAVIS - A One-stop Library for Language-Vision Intelligence

Results 282 LAVIS issues
Sort by recently updated
recently updated
newest added

Hi, how can I add the visual7w dataset for the VQA task? The adding datasets documentation is for AVSD task and I'm not sure how to do similar steps but...

The issue is about the [text localization example](https://github.com/salesforce/LAVIS/blob/main/examples/blip_text_localization.ipynb). The input image is "../docs/_static/merlion.png" while the input caption is changed to "Merlion near marina bay. It is a city in Singapore....

Hi, thanks for the great stuff! Is there any plan to update the torch version (from `==1.10` to anything newer), or relax it?

Hello, thanks for your nice work! Are there scripts and configuration files that can be used to finetune CLIP on COCO and Flickr30K, like BLIP ([retrieval_coco_ft.yaml](https://github.com/salesforce/LAVIS/blob/main/lavis/projects/blip/train/retrieval_coco_ft.yaml) and [train_retrieval_coco](https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip/train/train_retrieval_coco.sh))? Thanks again!

I've ran following piece of code ```python import torch from lavis.models import load_model, load_model_and_preprocess from PIL import Image device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # load sample image raw_image...

As I understand from looking at blip paper, NLVR takes pair of images, a sentence for them and predicts if the sentence describes the image pair. I have used the...

Hi, thank you for the great work! I wonder if there is any plan to incorporate the tensorboard visualization. Also if there is any plan to combine `pytorch_lightning`?

When I run the BLIP captioning model with transformers 4.22.2 I get the following error: ``` py import torch from lavis.models import load_model_and_preprocess device = torch.device("cuda" if torch.cuda.is_available() else "cpu")...

Hi, Is there a way to access the confidence of the generated caption?