MiniGPT-4
MiniGPT-4 copied to clipboard
Is there any scripts for inference ? Not gradio demo
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Smartphone (please complete the following information):
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
Additional context Add any other context about the problem here.
Me too, I wanna use minigpt-4 for batched inference
@XGGNet @LingoAmber did you guys find anything?
i want too
Same question.
please check this issue!
Any update on this?
Hi, I just finished a single inference code. Hope it can help you.
First, create a config file in eval_configs:
model: arch: minigpt4 model_type: pretrain_llama2 max_txt_len: 160 end_sym: "" low_resource: True prompt_template: '[INST] {} [/INST] ' llama_model: "/workspace/MiniGPT-4/ckpt/Llama-2-7b-chat-hf" ckpt: "/workspace/MiniGPT-4/ckpt/pretrained_minigpt4_llama2_7b.pth"
datasets: cc_sbu_align: vis_processor: train: name: "blip2_image_eval" image_size: 224 text_processor: train: name: "blip_caption"
run: task: image_text_pretrain
Then, make a inference.py file in root dir:
import torch from PIL import Image from minigpt4.common.config import Config from minigpt4.common.eval_utils import prepare_texts, eval_parser, init_model from minigpt4.common.registry import registry from minigpt4.conversation.conversation import CONV_VISION_minigptv2
def inference(model, vis_processor, image_path, prompt): model.eval() # 加载并处理图像 raw_image = Image.open(image_path).convert('RGB') image = vis_processor(raw_image).unsqueeze(0).to(torch.device("cuda"))
# 生成回答
output = model.generate(image, prompt, max_new_tokens=300)
return output[0]
if name == "main": parser = eval_parser() args = parser.parse_args() model, vis_processor = init_model(args)
image_path = "./examples/fun_2.png"
prompt = "What is the emotional state of the content in the image? Please tell me the reason."
# 准备对话模板
question = f"[vqa] Based on the image, respond to this question with a detailed answer: {prompt}"
conv_temp = CONV_VISION_minigptv2.copy()
conv_temp.system = ""
text = prepare_texts(question, conv_temp)
result = inference(model, vis_processor, image_path, text)
print("Output:", result)
You can run it by:
python inference.py --cfg-path ./eval_configs/minigpt4_inference_llama2.yaml
Hi, I just finished a single inference code. Hope it can help you.
First, create a config file in eval_configs:
model: arch: minigpt4 model_type: pretrain_llama2 max_txt_len: 160 end_sym: "" low_resource: True prompt_template: '[INST] {} [/INST] ' llama_model: "/workspace/MiniGPT-4/ckpt/Llama-2-7b-chat-hf" ckpt: "/workspace/MiniGPT-4/ckpt/pretrained_minigpt4_llama2_7b.pth"
datasets: cc_sbu_align: vis_processor: train: name: "blip2_image_eval" image_size: 224 text_processor: train: name: "blip_caption"
run: task: image_text_pretrain
Then, make a inference.py file in root dir:
import torch from PIL import Image from minigpt4.common.config import Config from minigpt4.common.eval_utils import prepare_texts, eval_parser, init_model from minigpt4.common.registry import registry from minigpt4.conversation.conversation import CONV_VISION_minigptv2
def inference(model, vis_processor, image_path, prompt): model.eval() # 加载并处理图像 raw_image = Image.open(image_path).convert('RGB') image = vis_processor(raw_image).unsqueeze(0).to(torch.device("cuda"))
# 生成回答 output = model.generate(image, prompt, max_new_tokens=300) return output[0]if name == "main": parser = eval_parser() args = parser.parse_args() model, vis_processor = init_model(args)
image_path = "./examples/fun_2.png" prompt = "What is the emotional state of the content in the image? Please tell me the reason." # 准备对话模板 question = f"[vqa] Based on the image, respond to this question with a detailed answer: {prompt}" conv_temp = CONV_VISION_minigptv2.copy() conv_temp.system = "" text = prepare_texts(question, conv_temp) result = inference(model, vis_processor, image_path, text) print("Output:", result)You can run it by:
python inference.py --cfg-path ./eval_configs/minigpt4_inference_llama2.yaml
How can I know which model_type should I fill in?