InvokeAI [bug]: Python error "all input tensors must be on the same device" when attempting clipseg/text

Is there an existing issue for this?

[X] I have searched the existing issues

OS

macOS

GPU

mps

VRAM

64GB

What version did you experience this issue on?

2.3.4.post1

What happened?

I am attempting for the first time to use the --text_mask feature of inpainting from the CLI, as described here: https://github.com/invoke-ai/InvokeAI/blob/main/docs/features/CLI.md#inpainting

I am using a Mac Studio with M1 Ultra, 64 GB RAM, Python 3.9.16 in the venv.

This results in a Python error: RuntimeError: torch.cat(): all input tensors must be on the same device. Received mps:0 and cpu.

Other InvokeAI features (txt2img, img2img, inpainting) are working as expected.

Example CLI output is as follows:

(stable-diffusion-1.5) invoke> a piece of cake -I /Users/me/Desktop/test.jpg -tm bagel

>> [TOKENLOG] Parsed Prompt: FlattenedPrompt:[Fragment:'a piece of cake'@1.0]

>> [TOKENLOG] Parsed Negative Prompt: FlattenedPrompt:[Fragment:''@1.0]

>> [TOKENLOG] Tokens  (4):
a piece of cake 
>> loaded input image of size 1170x2083 from /Users/me/Desktop/test.jpg
Traceback (most recent call last):
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 537, in prompt2image
    init_image, mask_image = self._make_images(
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 889, in _make_images
    init_mask = self._txt2mask(image, text_mask, width, height, fit=fit)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 1274, in _txt2mask
    segmented = self.txt2mask.segment(image, prompt)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/invoke/txt2mask.py", line 109, in segment
    outputs = self.model(**inputs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1426, in forward
    vision_outputs = self.clip.vision_model(
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 867, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 211, in forward
    embeddings = torch.cat([class_embeds, patch_embeds], dim=1)
RuntimeError: torch.cat(): all input tensors must be on the same device. Received mps:0 and cpu

>> Could not generate image.

>> Usage stats:
>>   0 image(s) generated in 0.29s
Outputs:

(stable-diffusion-1.5) invoke>

Screenshots

No response

Additional context

No response

Contact Details

[email protected]

May 01 '23 17:05 panicsteve

Just in case, I gave it another try in 2.3.5.post2.

The error is the same ("all input tensors must be on the same device"), but the traceback has changed a little. So just to keep the issue up to date:

>> loaded input image of size 1500x1125 from /Users/me/Desktop/bagel.jpg
>> This input is larger than your defaults. If you run out of memory, please use a smaller image.
>> Initializing clipseg model for text to mask inference
** Could not generate image.
>> An error occurred:
Traceback (most recent call last):
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/invoke/CLI.py", line 193, in main
    main_loop(gen, opt, completer)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/invoke/CLI.py", line 452, in main_loop
    gen.prompt2image(
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 542, in prompt2image
    init_image, mask_image = self._make_images(
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 893, in _make_images
    init_mask = self._txt2mask(image, text_mask, width, height, fit=fit)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/generate.py", line 1280, in _txt2mask
    segmented = self.txt2mask.segment(image, prompt)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/ldm/invoke/txt2mask.py", line 109, in segment
    outputs = self.model(**inputs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1426, in forward
    vision_outputs = self.clip.vision_model(
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 867, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Volumes/Storage/AI/InvokeAI/.venv/lib/python3.9/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 211, in forward
    embeddings = torch.cat([class_embeds, patch_embeds], dim=1)
RuntimeError: torch.cat(): all input tensors must be on the same device. Received mps:0 and cpu

May 23 '23 23:05 panicsteve

I ran into this exact error with modeling_clipseg.py at line 211 today as well. Tried to use !mask [image] -tm [thing to mask] [# mask threshold] to generate some mask files for use in canvas and other application. It would be great to have as it automates the masking process instead of doing all the work in photoshop + canvas.

Jun 26 '23 13:06 ymgenesis

[bug]: Python error "all input tensors must be on the same device" when attempting clipseg/text_mask on M1 Mac

Is there an existing issue for this?

OS

GPU

VRAM

What version did you experience this issue on?

What happened?

Screenshots

Additional context

Contact Details