LMFlow icon indicating copy to clipboard operation
LMFlow copied to clipboard

[FEATURES] Support image encoder with image caption as example

Open lianqing11 opened this issue 1 year ago • 2 comments

Support image encoder with image caption as example.

Try model: BLIP with Salesforce/blip-image-captioning-bas

Discussion:

  • the name of arch_type: visionEncoder_decoder
  • format the data with image_text
  • Should we generate another class inherited from HFEncoderDecoder?

lianqing11 avatar Jun 09 '23 10:06 lianqing11

I have fixed the code style issues.

Regarding to "[Question] line 286, 294: will this cause problem for inference of encoder-decoder models?", the pad_token_id argument is added to the kwargs, so it will not cause the problem for inference of encoder-decoder models.

I will add more features in this pr.

lianqing11 avatar Jun 12 '23 13:06 lianqing11

I have updated the demo with multi-modality conversation function: The example script is:

 export model="Salesforce/blip2-opt-2.7b"
 deepspeed examples/inference.py      \ 
     --deepspeed configs/ds_config_multimodal.json   \
      --model_name_or_path ${model}    \
      --arch_type vision_encoder_decoder  \
      --input_text "please describe the image:" --task vqa

lianqing11 avatar Jun 15 '23 13:06 lianqing11