LAVIS issues

Text tokenizer difference between foward and extract_feature

3

Hi, I notice that in `blip2_qformer.py`, in the `forward` function, the text_tokens are truncated to max_length which is 32, while in `extract_feature` function which to my understanding is an inference...

s7ev3n

The role of modeling_opt.py in the BLIP2 model

abinzzz

Why do I always encounter CUDA out of memory problem when I load the load_model_process function? Can the RTX 3090 be used for the BLIP-2 model?"

1

Why do I always encounter CUDA out of memory problem when I load the load_model_process function? Can the RTX 3090 be used for the BLIP-2 model?"

zhangmenghuan-mh

BLIP-2 onnx support

5

I would like to request support to convert the blip-2 model for onnx conversion. I have tried to convert the model using torch.onnx.export method but there are issues as the...

jethrolow

Finetuning VQA on BLIP2

7

Hi, Can you add the VQA fine-tuning function of BLIP2? In the paper, when you fine-tune the VQA task, you will fine-tune the image encoder. When I use the `freeze_vit:...

zhl98

Fine Tuning; Please explain how to fine-tune BLIP-2 with my own dataset for image captioning

1

I'm new to ML and I want your help. Is there useful document that has instrument on fine-tuning? I want to fine-tune pre-trained BLIP-2 for image captioning with my own...

Kota-jpn

How can I train a BLIP model completely from scratch?

1

I want to train it from scratch with my own dataset.

Nyuo70

How I can fine-tune instructBlip for custom dataset ?

3

Is there any script existed to fulfil this task? If not, can someone give some basic guidance on how could I write one myself?

Caohanwen0

Can existing large datasets be used to fine tune the blip2 caption task?

Can existing large datasets be used to fine tune the blip2 caption task? The dataset used is UFine6926 dataset, as its text description of images is very fine-grained, with an...

shams2023

Text Localization with Blip2

1

Starting from the tutorial [link](https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb) and considering the function **compute_gradcam** in BlipITM [link](https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip_models/blip_image_text_matching.py) I'm trying to obtain the same result but using Blip2ITM. Function **getAttMap** is at [link](https://github.com/salesforce/LAVIS/blob/main/lavis/common/gradcam.py). This is...

dip9811111

LAVIS
LAVIS copied to clipboard

Metadata

Text tokenizer difference between foward and extract_feature

The role of modeling_opt.py in the BLIP2 model

Why do I always encounter CUDA out of memory problem when I load the load_model_process function? Can the RTX 3090 be used for the BLIP-2 model?"

BLIP-2 onnx support

Finetuning VQA on BLIP2

Fine Tuning; Please explain how to fine-tune BLIP-2 with my own dataset for image captioning

How can I train a BLIP model completely from scratch?

How I can fine-tune instructBlip for custom dataset ?

Can existing large datasets be used to fine tune the blip2 caption task?

Text Localization with Blip2

← Metadata

Owner

Metadata

LAVIS LAVIS copied to clipboard

Metadata

← Metadata

Owner

Metadata

LAVIS
LAVIS copied to clipboard