lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Feature] Implement COG-VLM2

Open isidentical opened this issue 1 year ago • 3 comments
trafficstars

Motivation

CogVLM2 is now the SOTA open source VLM for captioning tasks.

Related resources

No response

Additional context

No response

isidentical avatar May 20 '24 20:05 isidentical

@isidentical hi, thanks for your information. We will include cogvlm2 after pr #1502 is merged.

RunningLeon avatar May 21 '24 03:05 RunningLeon

any update?

Jayantverma2 avatar May 23 '24 16:05 Jayantverma2

any update?

hi, it's in progress. Any update will sync to this issue.

RunningLeon avatar May 24 '24 11:05 RunningLeon

@isidentical @Jayantverma2 hi, guys. CogVLM2 models are supported in PR #1502. If you have time, have a try. Welcome to leave any comments in the PR. THX.

RunningLeon avatar May 28 '24 04:05 RunningLeon

@RunningLeon Is this the correct way to initialize the cogvlm2?

engine = pipeline(model_path, "cogvlm2",log_level="DEBUG") I have made some changes to config.json

{ "architectures": [ "CogVLMForCausalLM" ], "auto_map": { "AutoConfig": "configuration_cogvlm.CogVLMConfig", "AutoModelForCausalLM": "modeling_cogvlm.CogVLMForCausalLM" }, "vision_config": { "dropout_prob": 0.0, "hidden_act": "gelu", "in_channels": 3, "num_hidden_layers": 63, "hidden_size": 1792, "patch_size": 14, "num_heads": 16, "intermediate_size": 15360, "layer_norm_eps": 1e-06, "num_positions": 9217, "image_size": 1344 }, "hidden_size": 4096, "intermediate_size": 14336, "num_attention_heads": 32, "max_position_embeddings": 8192, "rms_norm_eps": 1e-05, "template_version": "chat", "initializer_range": 0.02, "bos_token_id": 128000, "eos_token_id": [128001, 128009], "pad_token_id": 128002, "vocab_size": 128256, "num_hidden_layers": 32, "hidden_act": "silu", "use_cache": true, "transformers_version": "4.41.0" }

But when I am running this with this prompt prompts = [ { 'role': 'user', 'content': [ {'type': 'text', 'text': prompt}, {'type': 'image_url', 'image_url': {'url': f'data:image/jpeg;base64,{image}'}} ] } ] it is generating b''

Tushar-ml avatar May 29 '24 07:05 Tushar-ml

@Tushar-ml hi, pls. follow examples in the document: https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#vlm-offline-inference-pipeline.

prompts should be like

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'describe this image'},
            {'type': 'image_url', 'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'}}
        ]
    }
]

RunningLeon avatar May 29 '24 08:05 RunningLeon

@RunningLeon any docs how to run this CogVLM2 as in PR mentioned, Tokenizer need to be applied manually

Tushar-ml avatar May 29 '24 19:05 Tushar-ml

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

pseudotensor avatar May 29 '24 22:05 pseudotensor

@RunningLeon any docs how to run this CogVLM2 as in PR mentioned, Tokenizer need to be applied manually

@Tushar-ml hi, no need to do so for cogvlm2, but should do for cogvlm(1).

RunningLeon avatar May 30 '24 02:05 RunningLeon

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@pseudotensor hi, glad to hear that. If possible, please recommend lmdeploy to other people who are interested in deploying LLMs and VLMs. Thanks.

RunningLeon avatar May 30 '24 02:05 RunningLeon

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@pseudotensor hi, glad to hear that. If possible, please recommend lmdeploy to other people who are interested in deploying LLMs and VLMs. Thanks.

Yes, will gladly do that.

pseudotensor avatar May 30 '24 02:05 pseudotensor

@RunningLeon I am getting OOM in A40G, 48 GRAM. What is the recommended system for cogvlm2, as model is of size not more than 40gb

Tushar-ml avatar May 30 '24 04:05 Tushar-ml

@RunningLeon I am getting OOM in A40G, 48 GRAM. What is the recommended system for cogvlm2, as model is of size not more than 40gb

@Tushar-ml hi, could you provide your sample code? Normally, you can reudce cache_max_entry_count to reduce kv mem size and reduce max_prefill_token_num from PytorchEngineConfig

https://github.com/InternLM/lmdeploy/blob/5a2aaf1dc81e101c282456305546787558e509ff/lmdeploy/messages.py#L202-L230

RunningLeon avatar May 30 '24 08:05 RunningLeon

Thanks @RunningLeon I will try this

Tushar-ml avatar May 30 '24 09:05 Tushar-ml

@RunningLeon Hi! Due to server network limitations, I could not compile and install the latest lmdeploy on the server, so I downloaded an image of lmdeploy0.4.2 on docker hub and ran it, then ran cogvlm2 and reported an error:

root@gpu9:~/data/CogVLM2# python cogvlm_demo.py 2024-05-31 01:31:08,920 - lmdeploy - ERROR - TypeError: expected string or bytes-like object 2024-05-31 01:31:08,920 - lmdeploy - ERROR - test failed! model /root/data/cogvlm2-llama3-chinese-chat-19B/ requires transformers version None but transformers 4.40.2 is installed.

my code: from lmdeploy import pipeline from lmdeploy.vl import load_image

model_path = '/root/data/cogvlm2-llama3-chinese-chat-19B/'

pipe = pipeline(model_path)

image = load_image('/root/data/dataset/misumi_data/images/Misumi000006.jpg') response = pipe(('图中出现的零件是什么?', image)) print(response)

I look forward to your reply. Thank you

GuoXu-booo avatar Jun 03 '24 01:06 GuoXu-booo

@GuoXu-booo hi, because cogvlm is supported in pytorch engine and can you simply clone the code from pr and run pip install -e to install it. BTW, you better use the latest code from PR #1502. The check env part fails in your case as there's no transformers_version in the config.json, which is fixed in here

git clone --recursive -b support-cogvlm-dev https://github.com/RunningLeon/lmdeploy.git
cd lmdeploy 
pip install -e .

RunningLeon avatar Jun 03 '24 02:06 RunningLeon

@RunningLeon is there any plans to use turbomind for CogVLM since it is faster for llama3?

isidentical avatar Jun 04 '24 20:06 isidentical

@RunningLeon is there any plans to use turbomind for CogVLM since it is faster for llama3?

sorry. No plan yet.

RunningLeon avatar Jun 05 '24 02:06 RunningLeon