visual-chatgpt icon indicating copy to clipboard operation
visual-chatgpt copied to clipboard

How to use other inference model in Visual Chatgpt?

Open jihwanp opened this issue 2 years ago • 4 comments

Hi Thanks for providing such an outstanding work.

I want to use Visual ChatGPT for visualizing the object bounding box or phrase box that I enter in the input of Visual ChatGPT. Where should I revise your code to enforce the model to recognize the right task for the customized model?

Thanks

jihwanp avatar Mar 14 '23 11:03 jihwanp

I have a similar requirement and would like to use a better semantic segmentation model

ddzipp avatar Mar 15 '23 07:03 ddzipp

Hi @jihwanp and @ddzipp ,

Actually, this is super easy.

For example, if you want to enable semantic segmentation, you need to do two things:

Firstly, write a custom semantic segmentation similar to text2image.

class Text2Image:
    def __init__(self, device):
        print(f"Initializing Text2Image to {device}")
        self.device = device
        self.torch_dtype = torch.float16 if 'cuda' in device else torch.float32
        self.pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5",
                                                            torch_dtype=self.torch_dtype)
        self.pipe.to(device)
        self.a_prompt = 'best quality, extremely detailed'
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, ' \
                        'fewer digits, cropped, worst quality, low quality'

    @prompts(name="Generate Image From User Input Text",
             description="useful when you want to generate an image from a user input text and save it to a file. "
                         "like: generate an image of an object or something, or generate an image that includes some objects. "
                         "The input to this tool should be a string, representing the text used to generate image. ")
    def inference(self, text):
        image_filename = os.path.join('image', f"{str(uuid.uuid4())[:8]}.png")
        prompt = text + ', ' + self.a_prompt
        image = self.pipe(prompt, negative_prompt=self.n_prompt).images[0]
        image.save(image_filename)
        print(
            f"\nProcessed Text2Image, Input Text: {text}, Output Image: {image_filename}")
        return image_filename

Secondly, in the cmd, add SemanticSegmentation_cuda:0

python visual_chatgpt.py --load "xxxxxx,SemanticSegmentation_cuda:0"

You can try to write a code like this, and I am appreciated if you could provide your feedbacks.

chenfei-wu avatar Mar 15 '23 11:03 chenfei-wu

Thanks for the quick reply, @chenfei-wu .

So the prompt (@prompts) that you've written in the example code is what Visual ChatGPT automatically recognizes for the model selection?

jihwanp avatar Mar 16 '23 11:03 jihwanp

Hi @jihwanp and @ddzipp ,

Actually, this is super easy.

For example, if you want to enable semantic segmentation, you need to do two things:

Firstly, write a custom semantic segmentation similar to text2image.

class Text2Image:
    def __init__(self, device):
        print(f"Initializing Text2Image to {device}")
        self.device = device
        self.torch_dtype = torch.float16 if 'cuda' in device else torch.float32
        self.pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5",
                                                            torch_dtype=self.torch_dtype)
        self.pipe.to(device)
        self.a_prompt = 'best quality, extremely detailed'
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, ' \
                        'fewer digits, cropped, worst quality, low quality'

    @prompts(name="Generate Image From User Input Text",
             description="useful when you want to generate an image from a user input text and save it to a file. "
                         "like: generate an image of an object or something, or generate an image that includes some objects. "
                         "The input to this tool should be a string, representing the text used to generate image. ")
    def inference(self, text):
        image_filename = os.path.join('image', f"{str(uuid.uuid4())[:8]}.png")
        prompt = text + ', ' + self.a_prompt
        image = self.pipe(prompt, negative_prompt=self.n_prompt).images[0]
        image.save(image_filename)
        print(
            f"\nProcessed Text2Image, Input Text: {text}, Output Image: {image_filename}")
        return image_filename

Secondly, in the cmd, add SemanticSegmentation_cuda:0

python visual_chatgpt.py --load "xxxxxx,SemanticSegmentation_cuda:0"

You can try to write a code like this, and I am appreciated if you could provide your feedbacks.

Thanks for your help ! I carefully read the paper and the codes in visual_chatgpt.py. And for now I understand how to use other inference model. However I find some problems #238 when I try to test the former segmentation models.

ddzipp avatar Mar 17 '23 12:03 ddzipp