MiniGPT-4 icon indicating copy to clipboard operation
MiniGPT-4 copied to clipboard

Request for CLI-based inference code of Minigpt-v2 instead of Gradio web interface.

Open uyo9ko opened this issue 2 years ago • 12 comments

For minigpt-v2, I've executed the following code to perform CLI-based inference. , but I would greatly appreciate it if you could provide an official CLI-based inference code for more straightforward usage:

chat = Chat(model, vis_processor, device=device)
gr_img ='0005.jpg'
chat_state = CONV_VISION.copy()
img_list= []
user_message= '[grounding] describe this image in detail'
chat.upload_img(gr_img, chat_state, img_list)
chat.ask(user_message, chat_state)
chat.encode_img(img_list)
llm_message = chat.answer(conv=chat_state,
                            img_list=img_list,
                            temperature=1.5,
                            max_new_tokens=500,
                            max_length=2000)[0]
print(llm_message)

uyo9ko avatar Oct 18 '23 06:10 uyo9ko

For minigpt-v2, I've executed the following code to perform CLI-based inference. , but I would greatly appreciate it if you could provide an official CLI-based inference code for more straightforward usage:

chat = Chat(model, vis_processor, device=device)
gr_img ='0005.jpg'
chat_state = CONV_VISION.copy()
img_list= []
user_message= '[grounding] describe this image in detail'
chat.upload_img(gr_img, chat_state, img_list)
chat.ask(user_message, chat_state)
chat.encode_img(img_list)
llm_message = chat.answer(conv=chat_state,
                            img_list=img_list,
                            temperature=1.5,
                            max_new_tokens=500,
                            max_length=2000)[0]
print(llm_message)

Can you run this code and get a reasonable output? I used this code, but the model didn't provide a reasonable output. I also tried to write a test code myself, but again I couldn't get a reasonable output.

ZhanYang-nwpu avatar Oct 19 '23 01:10 ZhanYang-nwpu

For minigpt-v2, I've executed the following code to perform CLI-based inference. , but I would greatly appreciate it if you could provide an official CLI-based inference code for more straightforward usage:

chat = Chat(model, vis_processor, device=device)
gr_img ='0005.jpg'
chat_state = CONV_VISION.copy()
img_list= []
user_message= '[grounding] describe this image in detail'
chat.upload_img(gr_img, chat_state, img_list)
chat.ask(user_message, chat_state)
chat.encode_img(img_list)
llm_message = chat.answer(conv=chat_state,
                            img_list=img_list,
                            temperature=1.5,
                            max_new_tokens=500,
                            max_length=2000)[0]
print(llm_message)

Can you run this code and get a reasonable output? I used this code, but the model didn't provide a reasonable output. I also tried to write a test code myself, but again I couldn't get a reasonable output.

At first I can't, it says the image is blank something. After I installed the correct version of the transformer suggested in the environment.yaml, it works well.

uyo9ko avatar Oct 19 '23 02:10 uyo9ko

environment.yaml,

That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable.

What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'?

Thank you very much for your reply

ZhanYang-nwpu avatar Oct 19 '23 02:10 ZhanYang-nwpu

environment.yaml,

That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable.

What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'?

Thank you very much for your reply

I think the transformer is just the reason, try change the version see what happen.

uyo9ko avatar Oct 19 '23 03:10 uyo9ko

I've changed the transformer version, but the output is still the same. It doesn't make sense.

Would it be convenient for you to provide your complete test code? Thank you very much.

environment.yaml,

That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable. What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'? Thank you very much for your reply

I think the transformer is just the reason, try change the version see what happen.

I've changed the transformer version, but the output is still the same. Would it be convenient for you to provide your complete test code? Thank you very much.

ZhanYang-nwpu avatar Oct 19 '23 03:10 ZhanYang-nwpu

I've changed the transformer version, but the output is still the same. It doesn't make sense.

Would it be convenient for you to provide your complete test code? Thank you very much.

environment.yaml,

That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable. What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'? Thank you very much for your reply

I think the transformer is just the reason, try change the version see what happen.

I've changed the transformer version, but the output is still the same. Would it be convenient for you to provide your complete test code? Thank you very much.

嗯嗯,我邮件发你吧

uyo9ko avatar Oct 19 '23 03:10 uyo9ko

I've changed the transformer version, but the output is still the same. It doesn't make sense. Would it be convenient for you to provide your complete test code? Thank you very much.

environment.yaml,

That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable. What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'? Thank you very much for your reply

I think the transformer is just the reason, try change the version see what happen.

I've changed the transformer version, but the output is still the same. Would it be convenient for you to provide your complete test code? Thank you very much.

嗯嗯,我邮件发你吧

非常感谢您! [email protected]

ZhanYang-nwpu avatar Oct 19 '23 03:10 ZhanYang-nwpu

I used this code, but it reported an error. Why? I would be very grateful if you could provide your complete code. [email protected] Below is my code: import argparse from minigpt4.common.config import Config from minigpt4.common.dist_utils import get_rank from minigpt4.common.registry import registry from minigpt4.conversation.conversation import Chat, Conversation from enum import auto, Enum

imports modules for registration

from minigpt4.datasets.builders import * from minigpt4.models import * from minigpt4.processors import * from minigpt4.runners import * from minigpt4.tasks import *

class SeparatorStyle(Enum): """Different separator style.""" SINGLE = auto() TWO = auto()

def parse_args(): parser = argparse.ArgumentParser(description="Demo") parser.add_argument("--cfg_path", default='eval_configs/minigpt4v2_eval.yaml', help="path to configuration file.") parser.add_argument("--img_path", default='', help="path to an input image.") parser.add_argument("--gpu_id", type=int, default=0, help="specify the gpu to load the model.") parser.add_argument("--num_beams", type=int, default=1) parser.add_argument("--temperature", type=int, default=1) parser.add_argument( "--options", nargs="+", help="override some settings in the used config, the key-value pair " "in xxx=yyy format will be merged into config file (deprecate), " "change to --cfg-options instead.", ) args = parser.parse_args() return args

def main(): # ======================================== # Model Initialization # ========================================

print('Initializing model')
args = parse_args()
cfg = Config(args)

args.img_path = 'images/1.jpg'

model_config = cfg.model_cfg
model_config.device_8bit = args.gpu_id
model_cls = registry.get_model_class(model_config.arch)
model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))

vis_processor_cfg = cfg.datasets_cfg.cc_sbu_align.vis_processor.train
vis_processor = registry.get_processor_class(vis_processor_cfg.name).from_config(vis_processor_cfg)
chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))
print('Model Initialization Finished')

CONV_VISION = Conversation(
    system="",
    roles=(r"<s>[INST] ", r" [/INST]"),
    messages=[],
    offset=2,
    sep_style=SeparatorStyle.SINGLE,
    sep="",
)
# # upload image
# chat_state = CONV_VISION.copy()
# img_list = []
# llm_message = chat.upload_img(args.img_path, chat_state, img_list)
# print(llm_message)
#
# # ask a question
# user_message = "what is this image about?"
# chat.ask(user_message, chat_state)
#
# # get answer
# llm_message = chat.answer(conv=chat_state,
#                           img_list=img_list,
#                           num_beams=args.num_beams,
#                           temperature=args.temperature,
#                           max_new_tokens=500,
#                           max_length=2000)[0]
#
# print(llm_message)

chat = Chat(model, vis_processor)
gr_img = 'images/sofa.jpg'
chat_state = CONV_VISION.copy()
img_list = []
user_message = '[grounding] describe this image in detail'
chat.upload_img(gr_img, chat_state, img_list)
chat.ask(user_message, chat_state)
chat.encode_img(img_list)
llm_message = chat.answer(conv=chat_state,
                          img_list=img_list,
                          temperature=1.5,
                          max_new_tokens=500,
                          max_length=2000)[0]
print(llm_message)

if name == "main": main()

Error message: Position interpolate from 16x16 to 32x32 Load Minigpt-4-LLM Checkpoint: /root/autodl-tmp/minigptv2_checkpoint.pth Model Initialization Finished Traceback (most recent call last): File "/root/MiniGPT-4/CLI.py", line 106, in main() File "/root/MiniGPT-4/CLI.py", line 97, in main llm_message = chat.answer(conv=chat_state, File "/root/MiniGPT-4/minigpt4/conversation/conversation.py", line 178, in answer generation_dict = self.answer_prepare(conv, img_list, **kargs) File "/root/MiniGPT-4/minigpt4/conversation/conversation.py", line 154, in answer_prepare embs = self.get_context_emb(conv, img_list) File "/root/MiniGPT-4/minigpt4/conversation/conversation.py", line 222, in get_context_emb prompt = conv.get_prompt() File "/root/MiniGPT-4/minigpt4/conversation/conversation.py", line 57, in get_prompt raise ValueError(f"Invalid style: {self.sep_style}") ValueError: Invalid style: SeparatorStyle.SINGLE

Yuan-9 avatar Oct 24 '23 15:10 Yuan-9

@ZhanYang-nwpu Can you share your code here?

BrianG13 avatar Oct 31 '23 16:10 BrianG13

What I do is just replace the code in demo_v2.py from line 520 to the end with the above code.

uyo9ko avatar Nov 01 '23 01:11 uyo9ko

At first I can't, it says the image is blank something. After I installed the correct version of the transformer suggested in the environment.yaml, it works well.

请帮帮忙。目前的Vision-CAIR/MiniGPT-4,在gradio 界面输出是乱码?请问该怎么解决。谢谢。我的问题:https://github.com/Vision-CAIR/MiniGPT-4/issues/422

lckj2009 avatar Nov 14 '23 01:11 lckj2009

I've changed the transformer version, but the output is still the same. It doesn't make sense. Would it be convenient for you to provide your complete test code? Thank you very much.

environment.yaml,

That's really strange. My version is not strictly consistent with environment.yaml. But I checked my input and everything was fine. I asked the question in #386 , which is very similar to question #381 . The model's output is not reasonable. What are the results you get with "examples_v2/office.jpg" and '[grounding] describe this image in detail'? Thank you very much for your reply

I think the transformer is just the reason, try change the version see what happen.

I've changed the transformer version, but the output is still the same. Would it be convenient for you to provide your complete test code? Thank you very much.

嗯嗯,我邮件发你吧

非常感谢您! [email protected]

请问这部分代码能否分享学习一下,谢谢!([email protected])

LCY5600 avatar Mar 05 '24 07:03 LCY5600