LAVIS
LAVIS copied to clipboard
LAVIS - A One-stop Library for Language-Vision Intelligence
Hello, I am currently in the process of evaluating the Blip2 model for one of my use cases, where I need to assess the similarity between text and images. For...
In Vicuna-7b-v1.1's config.json, there is : ``` "bos_token_id": 0, "eos_token_id": 1, "pad_token_id": -1, ``` In its generation_config.json, there is: ``` "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, ``` But actually, this...
Hello, thanks to your great work! In `blip2_vicuna_instruct.py`, the `bos_token` of LLM has been changed. Originally, it is '< s >' with idx:1. But after the following code: ``` self.llm_tokenizer.add_special_tokens({'pad_token':...
As a reminder, I find that the config of [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1/tree/main) and [lmsys-vicuna-7b-v1.1](https://huggingface.co/lmsys/vicuna-7b-v1.1) are different, i.e. they have different bos_token_id, eos_token_id, and pad_token_id, and only eachadea/vicuna-7b-1.1 can work well with instructBLIP....
Dear authors, Thank you for your great work, InstructBLIP ! I'd like to train InstructBLIP with my own instruction data. Could you provide example data file, or data generation code?...
Thanks for your wonderful work. I try to pre-train instrcutBLIP from scratch on 4x4 A100. However, the GPU memory is slowly increasing as the training progresses, which leads to OUT-OF-MEMORY...
Dear Author, I am currently running BLIP2 Instruct, the code really helps, but I only have 2 3090s avaiable, would you please consider updating the version to support multi-gpus? Thanks
in the BLIP-2 paper, "We propose Q-Former as the trainable module to bridge the gap between a frozen image encoder and a frozen LLM. It extracts a fixed number of...
Hi, Thanks for the repository and codes. I'd like to run the stylization notebook from [here](https://github.com/salesforce/LAVIS/blob/main/projects/blip-diffusion/notebooks/stylization.ipynb). When calling it, I receive the following error: ```python import torch import numpy as...
When I test instructed zero-shot vision-to-language generation, I get this kind of output. Can anybody tell me what's wrong? ['10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'] The model I used is : model, vis_processors, _ =...