LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

LAVIS - A One-stop Library for Language-Vision Intelligence

Results 282 LAVIS issues
Sort by recently updated
recently updated
newest added

Hi, I notice that in `blip2_qformer.py`, in the `forward` function, the text_tokens are truncated to max_length which is 32, while in `extract_feature` function which to my understanding is an inference...

The role of modeling_opt.py in the BLIP2 model

Why do I always encounter CUDA out of memory problem when I load the load_model_process function? Can the RTX 3090 be used for the BLIP-2 model?"

I would like to request support to convert the blip-2 model for onnx conversion. I have tried to convert the model using torch.onnx.export method but there are issues as the...

Hi, Can you add the VQA fine-tuning function of BLIP2? In the paper, when you fine-tune the VQA task, you will fine-tune the image encoder. When I use the `freeze_vit:...

I'm new to ML and I want your help. Is there useful document that has instrument on fine-tuning? I want to fine-tune pre-trained BLIP-2 for image captioning with my own...

I want to train it from scratch with my own dataset.

Is there any script existed to fulfil this task? If not, can someone give some basic guidance on how could I write one myself?

Can existing large datasets be used to fine tune the blip2 caption task? The dataset used is UFine6926 dataset, as its text description of images is very fine-grained, with an...

Starting from the tutorial [link](https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb) and considering the function **compute_gradcam** in BlipITM [link](https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip_models/blip_image_text_matching.py) I'm trying to obtain the same result but using Blip2ITM. Function **getAttMap** is at [link](https://github.com/salesforce/LAVIS/blob/main/lavis/common/gradcam.py). This is...