LAVIS
LAVIS copied to clipboard
LAVIS - A One-stop Library for Language-Vision Intelligence
Hi I'm using blip-2 and the loading of the models into GPU (weights cached) works very slowly `AutoModelForCausalLM.from_pretrained('facebook/opt-6.7b', cache_dir=".") ` `load_model_and_preprocess(name="img2prompt_vqa", model_type="base", is_eval=True, device=device)` each take a few minutes. Is...
According to the transformer in Huggingface, beam-search multinomial sampling can be implemented by setting `num_beams>1` and `do_sample=True`. However, this is not supported in LAVIS. If I set `num_beams=4, num_return_sequences=4` and...
Dear LAVIS team, As part of a project, we are trying to fine-tune BLIP Retrieval with a custom dataset on 2 RTX-3090 24GB GPUs. 1) We are getting the following...
I was trying to reproduce results with BLIP on VQAv2 test-dev and I observed a non-negligible difference between the VQA accuracy obtained using the [published checkpoint](https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_vqa_capfilt_large.pth) (**77.41%**) and the number...
sub.json is organized in the format: [{'image': '4385058960_b0f291553e.jpg', 'caption': 'a wooden chair in the living room', 'url': 'http://static.flickr.com/2723/4385058960_b0f291553e.jpg'}, ...} but the downloaded sbu_images.rar is extracted as: 0000/ 0001/ 0002/ 0003/...
The requirements file was not updated, casing a few small issues when running `pip install -e .` and trying to run the demo. 1. `torchvision` version is not specified, thus...
Apart from `spacy` VQA also requires `en_core_web_sm ` model. This PR is meant to be merged alongside with https://github.com/salesforce/LAVIS/pull/75, but is separated from it since it's a bit hacky. Still...
Hi, Thank you for the great work in publishing this repository. I'm trying to evaluate CLIP in text to image retrieval on flickr30k by running `evaluate.py` with `--cfg-path lavis/projects/clip/exp_flickr_ret_eval.yaml`. However,...