LMFlow
LMFlow copied to clipboard
[FEATURES] Support image encoder with image caption as example
Support image encoder with image caption as example.
Try model: BLIP with Salesforce/blip-image-captioning-bas
Discussion:
- the name of arch_type: visionEncoder_decoder
- format the data with image_text
- Should we generate another class inherited from HFEncoderDecoder?
I have fixed the code style issues.
Regarding to "[Question] line 286, 294: will this cause problem for inference of encoder-decoder models?", the pad_token_id argument is added to the kwargs, so it will not cause the problem for inference of encoder-decoder models.
I will add more features in this pr.
I have updated the demo with multi-modality conversation function: The example script is:
export model="Salesforce/blip2-opt-2.7b"
deepspeed examples/inference.py \
--deepspeed configs/ds_config_multimodal.json \
--model_name_or_path ${model} \
--arch_type vision_encoder_decoder \
--input_text "please describe the image:" --task vqa