CogVLM icon indicating copy to clipboard operation
CogVLM copied to clipboard

Chat using one image and three prompt

Open nlylmz opened this issue 10 months ago • 1 comments

System Info / 系統信息

tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5') model = AutoModelForCausalLM.from_pretrained( 'THUDM/cogvlm-chat-hf', torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True ).to('cuda').eval()

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • [ ] The official example scripts / 官方的示例脚本
  • [ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

According to the chat example below, I try to apply a chat conversation starting with one image and then sending three different prompts after getting a response for each prompt from the model. Please help me to implement how I can send multiple prompts in a sequence after getting the responses. I believe I need to use history but couldn't apply it. The first query has an image and text prompt The second query has only a text prompt The third query has only a text prompt

chat example

tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5') model = AutoModelForCausalLM.from_pretrained( 'THUDM/cogvlm-chat-hf', torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True ).to('cuda').eval()

query = 'Describe this image' image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB') inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode inputs = { 'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'), 'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'), 'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'), 'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]], } gen_kwargs = {"max_length": 2048, "do_sample": False}

with torch.no_grad(): outputs = model.generate(**inputs, **gen_kwargs) outputs = outputs[:, inputs['input_ids'].shape[1]:] print(tokenizer.decode(outputs[0]))

Expected behavior / 期待表现

Three answers for each text prompt.

nlylmz avatar Apr 23 '24 22:04 nlylmz