visual-chatgpt
visual-chatgpt copied to clipboard
有人能帮我看看吗?替换图片内容时报错了。
Initializing VisualChatGPT
Initializing StableDiffusionInpaint to cuda:0
text_encoder/model.safetensors not found
Fetching 16 files: 100% 16/16 [00:00<00:00, 52143.64it/s]
/usr/local/lib/python3.8/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
Initializing ImageCaptioning to cuda:0
Initializing T2I to cuda:0
Fetching 15 files: 100% 15/15 [00:00<00:00, 42027.09it/s]
Running on local URL: http://0.0.0.0:7860/
---------
> Entering new AgentExecutor chain...
Yes
Action: Replace Something From The Photo
Action Input: image/488f1c18.png, cat, teddy bearreplace_part_of_image: replace_with_txt teddy bear
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/gradio/routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.8/site-packages/gradio/blocks.py", line 1032, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.8/site-packages/gradio/blocks.py", line 844, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "visual_chatgpt.py", line 840, in run_text
res = self.agent({"input": text})
File "/usr/local/lib/python3.8/site-packages/langchain/chains/base.py", line 168, in __call__
raise e
File "/usr/local/lib/python3.8/site-packages/langchain/chains/base.py", line 165, in __call__
outputs = self._call(inputs)
File "/usr/local/lib/python3.8/site-packages/langchain/agents/agent.py", line 503, in _call
next_step_output = self._take_next_step(
File "/usr/local/lib/python3.8/site-packages/langchain/agents/agent.py", line 420, in _take_next_step
observation = tool.run(
File "/usr/local/lib/python3.8/site-packages/langchain/tools/base.py", line 71, in run
raise e
File "/usr/local/lib/python3.8/site-packages/langchain/tools/base.py", line 68, in run
observation = self._run(tool_input)
File "/usr/local/lib/python3.8/site-packages/langchain/agents/tools.py", line 17, in _run
return self.func(tool_input)
File "visual_chatgpt.py", line 164, in replace_part_of_image
updated_image = self.inpainting(prompt=replace_with_txt, image=original_image, mask_image=mask_image).images[0]
File "/usr/local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 798, in __call__
mask, masked_image = prepare_mask_and_masked_image(image, mask_image)
File "/usr/local/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 135, in prepare_mask_and_masked_image
masked_image = image * (mask < 0.5)
RuntimeError: The size of tensor a (384) must match the size of tensor b (512) at non-singleton dimension 3
碰到了同样的问题,最简单的解决方式是每次只传 512x512 的图,或者修改下面的代码后,传 1024*1024 这类的图,但是效果不好
119 class MaskFormer:
120 def __init__(self, device):
121 self.device = device
122 self.processor = CLIPSegProcessor.from_pretrained("CIDAS/clipseg-rd64-refined")
123 self.model = CLIPSegForImageSegmentation.from_pretrained("CIDAS/clipseg-rd64-refined").to(device)
124
125 def inference(self, image_path, text):
126 threshold = 0.5
127 min_area = 0.02
128 padding = 20
129 original_image = Image.open(image_path)
130 image = original_image.resize((512, 512))
131 inputs = self.processor(text=text, images=image, padding="max_length", return_tensors="pt",).to(self.device)
132 with torch.no_grad():
133 outputs = self.model(**inputs)
134 mask = torch.sigmoid(outputs[0]).squeeze().cpu().numpy() > threshold
135 area_ratio = len(np.argwhere(mask)) / (mask.shape[0] * mask.shape[1])
136 if area_ratio < min_area:
137 return None
138 true_indices = np.argwhere(mask)
139 mask_array = np.zeros_like(mask, dtype=bool)
140 for idx in true_indices:
141 padded_slice = tuple(slice(max(0, i - padding), i + padding + 1) for i in idx)
142 mask_array[padded_slice] = True
143 visual_mask = (mask_array * 255).astype(np.uint8)
144 image_mask = Image.fromarray(visual_mask)
145 return image_mask.resize(original_image.size) // 这里改下
这个代码里分辨率不好改,因为很多下游模型只能接收512*512的图片,因此如果手动更改代码中的分辨率很容易出错。另外,代码中对输入图像似乎没有进行resize,因此保证所有图像都是512*512是最好的选择。除非你自己加一些预处理模块。
这个代码里分辨率不好改,因为很多下游模型只能接收512512的图片,因此如果手动更改代码中的分辨率很容易出错。另外,代码中对输入图像似乎没有进行resize,因此保证所有图像都是512512是最好的选择。除非你自己加一些预处理模块。
好的,谢谢
Hi @YuanXiaoYaoZiZai 我们最新更新的代码中对这个resize的问题进行了修复,您现在可以尝试输入长方形的图片进行处理 Thanks