lmdeploy [Feature] Implement COG-VLM2

trafficstars

Motivation

CogVLM2 is now the SOTA open source VLM for captioning tasks.

Related resources

No response

Additional context

No response

May 20 '24 20:05 isidentical

@isidentical hi, thanks for your information. We will include cogvlm2 after pr #1502 is merged.

May 21 '24 03:05 RunningLeon

any update?

May 23 '24 16:05 Jayantverma2

any update?

hi, it's in progress. Any update will sync to this issue.

May 24 '24 11:05 RunningLeon

@isidentical @Jayantverma2 hi, guys. CogVLM2 models are supported in PR #1502. If you have time, have a try. Welcome to leave any comments in the PR. THX.

May 28 '24 04:05 RunningLeon

@RunningLeon Is this the correct way to initialize the cogvlm2?

engine = pipeline(model_path, "cogvlm2",log_level="DEBUG") I have made some changes to config.json

{ "architectures": [ "CogVLMForCausalLM" ], "auto_map": { "AutoConfig": "configuration_cogvlm.CogVLMConfig", "AutoModelForCausalLM": "modeling_cogvlm.CogVLMForCausalLM" }, "vision_config": { "dropout_prob": 0.0, "hidden_act": "gelu", "in_channels": 3, "num_hidden_layers": 63, "hidden_size": 1792, "patch_size": 14, "num_heads": 16, "intermediate_size": 15360, "layer_norm_eps": 1e-06, "num_positions": 9217, "image_size": 1344 }, "hidden_size": 4096, "intermediate_size": 14336, "num_attention_heads": 32, "max_position_embeddings": 8192, "rms_norm_eps": 1e-05, "template_version": "chat", "initializer_range": 0.02, "bos_token_id": 128000, "eos_token_id": [128001, 128009], "pad_token_id": 128002, "vocab_size": 128256, "num_hidden_layers": 32, "hidden_act": "silu", "use_cache": true, "transformers_version": "4.41.0" }

But when I am running this with this prompt prompts = [ { 'role': 'user', 'content': [ {'type': 'text', 'text': prompt}, {'type': 'image_url', 'image_url': {'url': f'data:image/jpeg;base64,{image}'}} ] } ] it is generating b''

May 29 '24 07:05 Tushar-ml

@Tushar-ml hi, pls. follow examples in the document: https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#vlm-offline-inference-pipeline.

prompts should be like

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'describe this image'},
            {'type': 'image_url', 'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'}}
        ]
    }
]

May 29 '24 08:05 RunningLeon

@RunningLeon any docs how to run this CogVLM2 as in PR mentioned, Tokenizer need to be applied manually

May 29 '24 19:05 Tushar-ml

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

May 29 '24 22:05 pseudotensor

@RunningLeon any docs how to run this CogVLM2 as in PR mentioned, Tokenizer need to be applied manually

@Tushar-ml hi, no need to do so for cogvlm2, but should do for cogvlm(1).

May 30 '24 02:05 RunningLeon

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@pseudotensor hi, glad to hear that. If possible, please recommend lmdeploy to other people who are interested in deploying LLMs and VLMs. Thanks.

May 30 '24 02:05 RunningLeon

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@pseudotensor hi, glad to hear that. If possible, please recommend lmdeploy to other people who are interested in deploying LLMs and VLMs. Thanks.

Yes, will gladly do that.

May 30 '24 02:05 pseudotensor

@RunningLeon I am getting OOM in A40G, 48 GRAM. What is the recommended system for cogvlm2, as model is of size not more than 40gb

May 30 '24 04:05 Tushar-ml

@RunningLeon I am getting OOM in A40G, 48 GRAM. What is the recommended system for cogvlm2, as model is of size not more than 40gb

@Tushar-ml hi, could you provide your sample code? Normally, you can reudce cache_max_entry_count to reduce kv mem size and reduce max_prefill_token_num from PytorchEngineConfig

https://github.com/InternLM/lmdeploy/blob/5a2aaf1dc81e101c282456305546787558e509ff/lmdeploy/messages.py#L202-L230

May 30 '24 08:05 RunningLeon

Thanks @RunningLeon I will try this

May 30 '24 09:05 Tushar-ml

@RunningLeon Hi！ Due to server network limitations, I could not compile and install the latest lmdeploy on the server, so I downloaded an image of lmdeploy0.4.2 on docker hub and ran it, then ran cogvlm2 and reported an error:

root@gpu9:~/data/CogVLM2# python cogvlm_demo.py 2024-05-31 01:31:08,920 - lmdeploy - ERROR - TypeError: expected string or bytes-like object 2024-05-31 01:31:08,920 - lmdeploy - ERROR - test failed! model /root/data/cogvlm2-llama3-chinese-chat-19B/ requires transformers version None but transformers 4.40.2 is installed.

my code： from lmdeploy import pipeline from lmdeploy.vl import load_image

model_path = '/root/data/cogvlm2-llama3-chinese-chat-19B/'

pipe = pipeline(model_path)

image = load_image('/root/data/dataset/misumi_data/images/Misumi000006.jpg') response = pipe(('图中出现的零件是什么？', image)) print(response)

I look forward to your reply. Thank you

Jun 03 '24 01:06 GuoXu-booo

@GuoXu-booo hi, because cogvlm is supported in pytorch engine and can you simply clone the code from pr and run pip install -e to install it. BTW, you better use the latest code from PR #1502. The check env part fails in your case as there's no transformers_version in the config.json, which is fixed in here

git clone --recursive -b support-cogvlm-dev https://github.com/RunningLeon/lmdeploy.git
cd lmdeploy 
pip install -e .

Jun 03 '24 02:06 RunningLeon

@RunningLeon is there any plans to use turbomind for CogVLM since it is faster for llama3?

Jun 04 '24 20:06 isidentical

@RunningLeon is there any plans to use turbomind for CogVLM since it is faster for llama3?

sorry. No plan yet.

Jun 05 '24 02:06 RunningLeon

lmdeploy lmdeploy copied to clipboard

[Feature] Implement COG-VLM2

Motivation

Related resources

Additional context

lmdeploy
lmdeploy copied to clipboard