Request for CLI-based inference code of Minigpt-v2 instead of Gradio web interface.
For minigpt-v2, I've executed the following code to perform CLI-based inference. , but I would greatly appreciate it if you could provide an official CLI-based inference code for more straightforward usage:
chat = Chat(model, vis_processor, device=device)
gr_img ='0005.jpg'
chat_state = CONV_VISION.copy()
img_list= []
user_message= '[grounding] describe this image in detail'
chat.upload_img(gr_img, chat_state, img_list)
chat.ask(user_message, chat_state)
chat.encode_img(img_list)
llm_message = chat.answer(conv=chat_state,
img_list=img_list,
temperature=1.5,
max_new_tokens=500,
max_length=2000)[0]
print(llm_message)
For minigpt-v2, I've executed the following code to perform CLI-based inference. , but I would greatly appreciate it if you could provide an official CLI-based inference code for more straightforward usage:
chat = Chat(model, vis_processor, device=device) gr_img ='0005.jpg' chat_state = CONV_VISION.copy() img_list= [] user_message= '[grounding] describe this image in detail' chat.upload_img(gr_img, chat_state, img_list) chat.ask(user_message, chat_state) chat.encode_img(img_list) llm_message = chat.answer(conv=chat_state, img_list=img_list, temperature=1.5, max_new_tokens=500, max_length=2000)[0] print(llm_message)
Can you run this code and get a reasonable output? I used this code, but the model didn't provide a reasonable output. I also tried to write a test code myself, but again I couldn't get a reasonable output.
For minigpt-v2, I've executed the following code to perform CLI-based inference. , but I would greatly appreciate it if you could provide an official CLI-based inference code for more straightforward usage:
chat = Chat(model, vis_processor, device=device) gr_img ='0005.jpg' chat_state = CONV_VISION.copy() img_list= [] user_message= '[grounding] describe this image in detail' chat.upload_img(gr_img, chat_state, img_list) chat.ask(user_message, chat_state) chat.encode_img(img_list) llm_message = chat.answer(conv=chat_state, img_list=img_list, temperature=1.5, max_new_tokens=500, max_length=2000)[0] print(llm_message)Can you run this code and get a reasonable output? I used this code, but the model didn't provide a reasonable output. I also tried to write a test code myself, but again I couldn't get a reasonable output.
At first I can't, it says the image is blank something. After I installed the correct version of the transformer suggested in the environment.yaml, it works well.
environment.yaml,
That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable.
What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'?
Thank you very much for your reply
environment.yaml,
That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable.
What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'?
Thank you very much for your reply
I think the transformer is just the reason, try change the version see what happen.
I've changed the transformer version, but the output is still the same. It doesn't make sense.
Would it be convenient for you to provide your complete test code? Thank you very much.
environment.yaml,
That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable. What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'? Thank you very much for your reply
I think the transformer is just the reason, try change the version see what happen.
I've changed the transformer version, but the output is still the same. Would it be convenient for you to provide your complete test code? Thank you very much.
I've changed the transformer version, but the output is still the same. It doesn't make sense.
Would it be convenient for you to provide your complete test code? Thank you very much.
environment.yaml,
That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable. What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'? Thank you very much for your reply
I think the transformer is just the reason, try change the version see what happen.
I've changed the transformer version, but the output is still the same. Would it be convenient for you to provide your complete test code? Thank you very much.
嗯嗯,我邮件发你吧
I've changed the transformer version, but the output is still the same. It doesn't make sense. Would it be convenient for you to provide your complete test code? Thank you very much.
environment.yaml,
That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable. What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'? Thank you very much for your reply
I think the transformer is just the reason, try change the version see what happen.
I've changed the transformer version, but the output is still the same. Would it be convenient for you to provide your complete test code? Thank you very much.
嗯嗯,我邮件发你吧
非常感谢您! [email protected]
I used this code, but it reported an error. Why? I would be very grateful if you could provide your complete code. [email protected] Below is my code: import argparse from minigpt4.common.config import Config from minigpt4.common.dist_utils import get_rank from minigpt4.common.registry import registry from minigpt4.conversation.conversation import Chat, Conversation from enum import auto, Enum
imports modules for registration
from minigpt4.datasets.builders import * from minigpt4.models import * from minigpt4.processors import * from minigpt4.runners import * from minigpt4.tasks import *
class SeparatorStyle(Enum): """Different separator style.""" SINGLE = auto() TWO = auto()
def parse_args(): parser = argparse.ArgumentParser(description="Demo") parser.add_argument("--cfg_path", default='eval_configs/minigpt4v2_eval.yaml', help="path to configuration file.") parser.add_argument("--img_path", default='', help="path to an input image.") parser.add_argument("--gpu_id", type=int, default=0, help="specify the gpu to load the model.") parser.add_argument("--num_beams", type=int, default=1) parser.add_argument("--temperature", type=int, default=1) parser.add_argument( "--options", nargs="+", help="override some settings in the used config, the key-value pair " "in xxx=yyy format will be merged into config file (deprecate), " "change to --cfg-options instead.", ) args = parser.parse_args() return args
def main(): # ======================================== # Model Initialization # ========================================
print('Initializing model')
args = parse_args()
cfg = Config(args)
args.img_path = 'images/1.jpg'
model_config = cfg.model_cfg
model_config.device_8bit = args.gpu_id
model_cls = registry.get_model_class(model_config.arch)
model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))
vis_processor_cfg = cfg.datasets_cfg.cc_sbu_align.vis_processor.train
vis_processor = registry.get_processor_class(vis_processor_cfg.name).from_config(vis_processor_cfg)
chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))
print('Model Initialization Finished')
CONV_VISION = Conversation(
system="",
roles=(r"<s>[INST] ", r" [/INST]"),
messages=[],
offset=2,
sep_style=SeparatorStyle.SINGLE,
sep="",
)
# # upload image
# chat_state = CONV_VISION.copy()
# img_list = []
# llm_message = chat.upload_img(args.img_path, chat_state, img_list)
# print(llm_message)
#
# # ask a question
# user_message = "what is this image about?"
# chat.ask(user_message, chat_state)
#
# # get answer
# llm_message = chat.answer(conv=chat_state,
# img_list=img_list,
# num_beams=args.num_beams,
# temperature=args.temperature,
# max_new_tokens=500,
# max_length=2000)[0]
#
# print(llm_message)
chat = Chat(model, vis_processor)
gr_img = 'images/sofa.jpg'
chat_state = CONV_VISION.copy()
img_list = []
user_message = '[grounding] describe this image in detail'
chat.upload_img(gr_img, chat_state, img_list)
chat.ask(user_message, chat_state)
chat.encode_img(img_list)
llm_message = chat.answer(conv=chat_state,
img_list=img_list,
temperature=1.5,
max_new_tokens=500,
max_length=2000)[0]
print(llm_message)
if name == "main": main()
Error message:
Position interpolate from 16x16 to 32x32
Load Minigpt-4-LLM Checkpoint: /root/autodl-tmp/minigptv2_checkpoint.pth
Model Initialization Finished
Traceback (most recent call last):
File "/root/MiniGPT-4/CLI.py", line 106, in
@ZhanYang-nwpu Can you share your code here?
What I do is just replace the code in demo_v2.py from line 520 to the end with the above code.
At first I can't, it says the image is blank something. After I installed the correct version of the transformer suggested in the environment.yaml, it works well.
请帮帮忙。目前的Vision-CAIR/MiniGPT-4,在gradio 界面输出是乱码?请问该怎么解决。谢谢。我的问题:https://github.com/Vision-CAIR/MiniGPT-4/issues/422
I've changed the transformer version, but the output is still the same. It doesn't make sense. Would it be convenient for you to provide your complete test code? Thank you very much.
environment.yaml,
That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable. What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'? Thank you very much for your reply
I think the transformer is just the reason, try change the version see what happen.
I've changed the transformer version, but the output is still the same. Would it be convenient for you to provide your complete test code? Thank you very much.
嗯嗯,我邮件发你吧
非常感谢您! [email protected]
请问这部分代码能否分享学习一下,谢谢!([email protected])