How to use other inference model in Visual Chatgpt?
Hi Thanks for providing such an outstanding work.
I want to use Visual ChatGPT for visualizing the object bounding box or phrase box that I enter in the input of Visual ChatGPT. Where should I revise your code to enforce the model to recognize the right task for the customized model?
Thanks
I have a similar requirement and would like to use a better semantic segmentation model
Hi @jihwanp and @ddzipp ,
Actually, this is super easy.
For example, if you want to enable semantic segmentation, you need to do two things:
Firstly, write a custom semantic segmentation similar to text2image.
class Text2Image:
def __init__(self, device):
print(f"Initializing Text2Image to {device}")
self.device = device
self.torch_dtype = torch.float16 if 'cuda' in device else torch.float32
self.pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5",
torch_dtype=self.torch_dtype)
self.pipe.to(device)
self.a_prompt = 'best quality, extremely detailed'
self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, ' \
'fewer digits, cropped, worst quality, low quality'
@prompts(name="Generate Image From User Input Text",
description="useful when you want to generate an image from a user input text and save it to a file. "
"like: generate an image of an object or something, or generate an image that includes some objects. "
"The input to this tool should be a string, representing the text used to generate image. ")
def inference(self, text):
image_filename = os.path.join('image', f"{str(uuid.uuid4())[:8]}.png")
prompt = text + ', ' + self.a_prompt
image = self.pipe(prompt, negative_prompt=self.n_prompt).images[0]
image.save(image_filename)
print(
f"\nProcessed Text2Image, Input Text: {text}, Output Image: {image_filename}")
return image_filename
Secondly, in the cmd, add SemanticSegmentation_cuda:0
python visual_chatgpt.py --load "xxxxxx,SemanticSegmentation_cuda:0"
You can try to write a code like this, and I am appreciated if you could provide your feedbacks.
Thanks for the quick reply, @chenfei-wu .
So the prompt (@prompts) that you've written in the example code is what Visual ChatGPT automatically recognizes for the model selection?
Hi @jihwanp and @ddzipp ,
Actually, this is super easy.
For example, if you want to enable semantic segmentation, you need to do two things:
Firstly, write a custom semantic segmentation similar to text2image.
class Text2Image: def __init__(self, device): print(f"Initializing Text2Image to {device}") self.device = device self.torch_dtype = torch.float16 if 'cuda' in device else torch.float32 self.pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=self.torch_dtype) self.pipe.to(device) self.a_prompt = 'best quality, extremely detailed' self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, ' \ 'fewer digits, cropped, worst quality, low quality' @prompts(name="Generate Image From User Input Text", description="useful when you want to generate an image from a user input text and save it to a file. " "like: generate an image of an object or something, or generate an image that includes some objects. " "The input to this tool should be a string, representing the text used to generate image. ") def inference(self, text): image_filename = os.path.join('image', f"{str(uuid.uuid4())[:8]}.png") prompt = text + ', ' + self.a_prompt image = self.pipe(prompt, negative_prompt=self.n_prompt).images[0] image.save(image_filename) print( f"\nProcessed Text2Image, Input Text: {text}, Output Image: {image_filename}") return image_filenameSecondly, in the cmd, add SemanticSegmentation_cuda:0
python visual_chatgpt.py --load "xxxxxx,SemanticSegmentation_cuda:0"You can try to write a code like this, and I am appreciated if you could provide your feedbacks.
Thanks for your help ! I carefully read the paper and the codes in visual_chatgpt.py. And for now I understand how to use other inference model. However I find some problems #238 when I try to test the former segmentation models.