MiniGPT-4 Is there any scripts for inference ? Not gradio demo

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context Add any other context about the problem here.

Aug 29 '23 10:08 LingoAmber

Me too, I wanna use minigpt-4 for batched inference

Sep 08 '23 08:09 chenxinli001

@XGGNet @LingoAmber did you guys find anything?

Sep 20 '23 07:09 sushilkhadkaanon

i want too

Oct 11 '23 23:10 dirtycomputer

Same question.

Nov 02 '23 02:11 wyuzh

please check this issue！

Nov 02 '23 09:11 dirtycomputer

Any update on this?

Jul 21 '24 03:07 HarryWang355

Hi, I just finished a single inference code. Hope it can help you.

First, create a config file in eval_configs:

model: arch: minigpt4 model_type: pretrain_llama2 max_txt_len: 160 end_sym: "" low_resource: True prompt_template: '[INST] {} [/INST] ' llama_model: "/workspace/MiniGPT-4/ckpt/Llama-2-7b-chat-hf" ckpt: "/workspace/MiniGPT-4/ckpt/pretrained_minigpt4_llama2_7b.pth"

datasets: cc_sbu_align: vis_processor: train: name: "blip2_image_eval" image_size: 224 text_processor: train: name: "blip_caption"

run: task: image_text_pretrain

Then, make a inference.py file in root dir:

import torch from PIL import Image from minigpt4.common.config import Config from minigpt4.common.eval_utils import prepare_texts, eval_parser, init_model from minigpt4.common.registry import registry from minigpt4.conversation.conversation import CONV_VISION_minigptv2

def inference(model, vis_processor, image_path, prompt): model.eval() # 加载并处理图像 raw_image = Image.open(image_path).convert('RGB') image = vis_processor(raw_image).unsqueeze(0).to(torch.device("cuda"))

# 生成回答
output = model.generate(image, prompt, max_new_tokens=300)

return output[0]

if name == "main": parser = eval_parser() args = parser.parse_args() model, vis_processor = init_model(args)

image_path = "./examples/fun_2.png"
prompt = "What is the emotional state of the content in the image? Please tell me the reason."
# 准备对话模板
question = f"[vqa] Based on the image, respond to this question with a detailed answer: {prompt}"
conv_temp = CONV_VISION_minigptv2.copy()
conv_temp.system = ""
text = prepare_texts(question, conv_temp)

result = inference(model, vis_processor, image_path, text)
print("Output:", result)

You can run it by: python inference.py --cfg-path ./eval_configs/minigpt4_inference_llama2.yaml

Sep 07 '24 10:09 yuffffff116

Hi, I just finished a single inference code. Hope it can help you.

First, create a config file in eval_configs:

model: arch: minigpt4 model_type: pretrain_llama2 max_txt_len: 160 end_sym: "" low_resource: True prompt_template: '[INST] {} [/INST] ' llama_model: "/workspace/MiniGPT-4/ckpt/Llama-2-7b-chat-hf" ckpt: "/workspace/MiniGPT-4/ckpt/pretrained_minigpt4_llama2_7b.pth"

datasets: cc_sbu_align: vis_processor: train: name: "blip2_image_eval" image_size: 224 text_processor: train: name: "blip_caption"

run: task: image_text_pretrain

Then, make a inference.py file in root dir:

import torch from PIL import Image from minigpt4.common.config import Config from minigpt4.common.eval_utils import prepare_texts, eval_parser, init_model from minigpt4.common.registry import registry from minigpt4.conversation.conversation import CONV_VISION_minigptv2

def inference(model, vis_processor, image_path, prompt): model.eval() # 加载并处理图像 raw_image = Image.open(image_path).convert('RGB') image = vis_processor(raw_image).unsqueeze(0).to(torch.device("cuda"))
# 生成回答
output = model.generate(image, prompt, max_new_tokens=300)

return output[0]
if name == "main": parser = eval_parser() args = parser.parse_args() model, vis_processor = init_model(args)
image_path = "./examples/fun_2.png"
prompt = "What is the emotional state of the content in the image? Please tell me the reason."
# 准备对话模板
question = f"[vqa] Based on the image, respond to this question with a detailed answer: {prompt}"
conv_temp = CONV_VISION_minigptv2.copy()
conv_temp.system = ""
text = prepare_texts(question, conv_temp)

result = inference(model, vis_processor, image_path, text)
print("Output:", result)
You can run it by: python inference.py --cfg-path ./eval_configs/minigpt4_inference_llama2.yaml

How can I know which model_type should I fill in?

Nov 18 '25 11:11 rrryan2016

MiniGPT-4 MiniGPT-4 copied to clipboard

Is there any scripts for inference ? Not gradio demo

MiniGPT-4
MiniGPT-4 copied to clipboard