MiniGPT-4 icon indicating copy to clipboard operation
MiniGPT-4 copied to clipboard

Is there any scripts for inference ? Not gradio demo

Open LingoAmber opened this issue 2 years ago • 7 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

LingoAmber avatar Aug 29 '23 10:08 LingoAmber

Me too, I wanna use minigpt-4 for batched inference

chenxinli001 avatar Sep 08 '23 08:09 chenxinli001

@XGGNet @LingoAmber did you guys find anything?

sushilkhadkaanon avatar Sep 20 '23 07:09 sushilkhadkaanon

i want too

dirtycomputer avatar Oct 11 '23 23:10 dirtycomputer

Same question.

wyuzh avatar Nov 02 '23 02:11 wyuzh

please check this issue!

dirtycomputer avatar Nov 02 '23 09:11 dirtycomputer

Any update on this?

HarryWang355 avatar Jul 21 '24 03:07 HarryWang355

Hi, I just finished a single inference code. Hope it can help you.

First, create a config file in eval_configs:

model: arch: minigpt4 model_type: pretrain_llama2 max_txt_len: 160 end_sym: "" low_resource: True prompt_template: '[INST] {} [/INST] ' llama_model: "/workspace/MiniGPT-4/ckpt/Llama-2-7b-chat-hf" ckpt: "/workspace/MiniGPT-4/ckpt/pretrained_minigpt4_llama2_7b.pth"

datasets: cc_sbu_align: vis_processor: train: name: "blip2_image_eval" image_size: 224 text_processor: train: name: "blip_caption"

run: task: image_text_pretrain

Then, make a inference.py file in root dir:

import torch from PIL import Image from minigpt4.common.config import Config from minigpt4.common.eval_utils import prepare_texts, eval_parser, init_model from minigpt4.common.registry import registry from minigpt4.conversation.conversation import CONV_VISION_minigptv2

def inference(model, vis_processor, image_path, prompt): model.eval() # 加载并处理图像 raw_image = Image.open(image_path).convert('RGB') image = vis_processor(raw_image).unsqueeze(0).to(torch.device("cuda"))

# 生成回答
output = model.generate(image, prompt, max_new_tokens=300)

return output[0]

if name == "main": parser = eval_parser() args = parser.parse_args() model, vis_processor = init_model(args)

image_path = "./examples/fun_2.png"
prompt = "What is the emotional state of the content in the image? Please tell me the reason."
# 准备对话模板
question = f"[vqa] Based on the image, respond to this question with a detailed answer: {prompt}"
conv_temp = CONV_VISION_minigptv2.copy()
conv_temp.system = ""
text = prepare_texts(question, conv_temp)

result = inference(model, vis_processor, image_path, text)
print("Output:", result)

You can run it by: python inference.py --cfg-path ./eval_configs/minigpt4_inference_llama2.yaml

yuffffff116 avatar Sep 07 '24 10:09 yuffffff116

Hi, I just finished a single inference code. Hope it can help you.

First, create a config file in eval_configs:

model: arch: minigpt4 model_type: pretrain_llama2 max_txt_len: 160 end_sym: "" low_resource: True prompt_template: '[INST] {} [/INST] ' llama_model: "/workspace/MiniGPT-4/ckpt/Llama-2-7b-chat-hf" ckpt: "/workspace/MiniGPT-4/ckpt/pretrained_minigpt4_llama2_7b.pth"

datasets: cc_sbu_align: vis_processor: train: name: "blip2_image_eval" image_size: 224 text_processor: train: name: "blip_caption"

run: task: image_text_pretrain

Then, make a inference.py file in root dir:

import torch from PIL import Image from minigpt4.common.config import Config from minigpt4.common.eval_utils import prepare_texts, eval_parser, init_model from minigpt4.common.registry import registry from minigpt4.conversation.conversation import CONV_VISION_minigptv2

def inference(model, vis_processor, image_path, prompt): model.eval() # 加载并处理图像 raw_image = Image.open(image_path).convert('RGB') image = vis_processor(raw_image).unsqueeze(0).to(torch.device("cuda"))

# 生成回答
output = model.generate(image, prompt, max_new_tokens=300)

return output[0]

if name == "main": parser = eval_parser() args = parser.parse_args() model, vis_processor = init_model(args)

image_path = "./examples/fun_2.png"
prompt = "What is the emotional state of the content in the image? Please tell me the reason."
# 准备对话模板
question = f"[vqa] Based on the image, respond to this question with a detailed answer: {prompt}"
conv_temp = CONV_VISION_minigptv2.copy()
conv_temp.system = ""
text = prepare_texts(question, conv_temp)

result = inference(model, vis_processor, image_path, text)
print("Output:", result)

You can run it by: python inference.py --cfg-path ./eval_configs/minigpt4_inference_llama2.yaml

How can I know which model_type should I fill in?

rrryan2016 avatar Nov 18 '25 11:11 rrryan2016