[bug]: Python error "all input tensors must be on the same device" when attempting clipseg/text_mask on M1 Mac
Is there an existing issue for this?
- [X] I have searched the existing issues
OS
macOS
GPU
mps
VRAM
64GB
What version did you experience this issue on?
2.3.4.post1
What happened?
I am attempting for the first time to use the --text_mask feature of inpainting from the CLI, as described here: https://github.com/invoke-ai/InvokeAI/blob/main/docs/features/CLI.md#inpainting
I am using a Mac Studio with M1 Ultra, 64 GB RAM, Python 3.9.16 in the venv.
This results in a Python error: RuntimeError: torch.cat(): all input tensors must be on the same device. Received mps:0 and cpu.
Other InvokeAI features (txt2img, img2img, inpainting) are working as expected.
Example CLI output is as follows:
(stable-diffusion-1.5) invoke> a piece of cake -I /Users/me/Desktop/test.jpg -tm bagel
>> [TOKENLOG] Parsed Prompt: FlattenedPrompt:[Fragment:'a piece of cake'@1.0]
>> [TOKENLOG] Parsed Negative Prompt: FlattenedPrompt:[Fragment:''@1.0]
>> [TOKENLOG] Tokens (4):
a piece of cake
>> loaded input image of size 1170x2083 from /Users/me/Desktop/test.jpg
Traceback (most recent call last):
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 537, in prompt2image
init_image, mask_image = self._make_images(
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 889, in _make_images
init_mask = self._txt2mask(image, text_mask, width, height, fit=fit)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 1274, in _txt2mask
segmented = self.txt2mask.segment(image, prompt)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/invoke/txt2mask.py", line 109, in segment
outputs = self.model(**inputs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1426, in forward
vision_outputs = self.clip.vision_model(
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 867, in forward
hidden_states = self.embeddings(pixel_values)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 211, in forward
embeddings = torch.cat([class_embeds, patch_embeds], dim=1)
RuntimeError: torch.cat(): all input tensors must be on the same device. Received mps:0 and cpu
>> Could not generate image.
>> Usage stats:
>> 0 image(s) generated in 0.29s
Outputs:
(stable-diffusion-1.5) invoke>
Screenshots
No response
Additional context
No response
Contact Details
Just in case, I gave it another try in 2.3.5.post2.
The error is the same ("all input tensors must be on the same device"), but the traceback has changed a little. So just to keep the issue up to date:
>> loaded input image of size 1500x1125 from /Users/me/Desktop/bagel.jpg
>> This input is larger than your defaults. If you run out of memory, please use a smaller image.
>> Initializing clipseg model for text to mask inference
** Could not generate image.
>> An error occurred:
Traceback (most recent call last):
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/invoke/CLI.py", line 193, in main
main_loop(gen, opt, completer)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/invoke/CLI.py", line 452, in main_loop
gen.prompt2image(
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 542, in prompt2image
init_image, mask_image = self._make_images(
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 893, in _make_images
init_mask = self._txt2mask(image, text_mask, width, height, fit=fit)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 1280, in _txt2mask
segmented = self.txt2mask.segment(image, prompt)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/invoke/txt2mask.py", line 109, in segment
outputs = self.model(**inputs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1426, in forward
vision_outputs = self.clip.vision_model(
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 867, in forward
hidden_states = self.embeddings(pixel_values)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 211, in forward
embeddings = torch.cat([class_embeds, patch_embeds], dim=1)
RuntimeError: torch.cat(): all input tensors must be on the same device. Received mps:0 and cpu
I ran into this exact error with modeling_clipseg.py at line 211 today as well. Tried to use !mask [image] -tm [thing to mask] [# mask threshold] to generate some mask files for use in canvas and other application. It would be great to have as it automates the masking process instead of doing all the work in photoshop + canvas.