sd-webui-controlnet icon indicating copy to clipboard operation
sd-webui-controlnet copied to clipboard

MPS keeps crashing

Open enzyme69 opened this issue 1 year ago • 20 comments

I got the ControlNet extension loading fine, but it keeps on crashing when I use scribble:

  0%|                                                    | 0/20 [00:00<?, ?it/s](mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:228:0: error: 'mps.add' op requires the same element type for all operands and results
(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:228:0: note: see current operation: %5 = "mps.add"(%4, %arg2) : (tensor<2x1280xf32>, tensor<*xf16>) -> tensor<*xf32>
zsh: segmentation fault  ./webui.sh
/opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

enzyme69 avatar Feb 15 '23 13:02 enzyme69

Getting the same error when I use any model

Loading preprocessor: depth, model: control_sd15_depth [fef5e48e] Loaded state_dict from [/Users/philbuck/sd/extensions/sd-webui-controlnet/models/control_sd15_depth.pth] ControlNet model control_sd15_depth [fef5e48e] loaded. 0%| | 0/16 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/0aa643d0-625a-11ed-b319-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). zsh: abort ./webui.sh (base) philbuck@PhilsMacStudio sd % /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Philbuck84 avatar Feb 15 '23 16:02 Philbuck84

You can get it working if you use --no-half, but it's obviously a lot slower and uses a lot more memory. Hoping for a solution to this, the tool looks really cool.

jwooldridge234 avatar Feb 15 '23 16:02 jwooldridge234

my Web UI launches with the argument: --no-half-vae, does that have a different effect than --no-half?

Philbuck84 avatar Feb 15 '23 17:02 Philbuck84

Yeah, I still get the tensor size mismatch with --no-half-vae, only --no-half fixes it. I'm not the most knowledgable on pytorch, but I believe that the issue is when you try to do an operation with a tensor of type float16 and a tensor of size float32. --no-half forces everything to use float32 and fixes the issue, but at a significant cost to performance.

jwooldridge234 avatar Feb 15 '23 17:02 jwooldridge234

Hmm, --no-half unfortunately doesn't fix it for me. I get a whole different set of errors.

Philbuck84 avatar Feb 15 '23 17:02 Philbuck84

Interesting... mind posting them and your system specs?

jwooldridge234 avatar Feb 15 '23 17:02 jwooldridge234

Sure, systems specs are Mac Studio M1 Ultra Running Ventura 13.1 Error is super long:

ControlNet model control_sd15_openpose [fef5e48e] loaded. Error running process: /Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/controlnet.py Traceback (most recent call last): File "/Users/philbuck/sd/modules/scripts.py", line 386, in process script.process(p, *script_args) File "/Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/controlnet.py", line 270, in process input_image = HWC3(image['image']) TypeError: 'NoneType' object is not subscriptable

0%| | 0/20 [00:00<?, ?it/s] Error completing request Arguments: ('task(5q0lfqy3o0p0qe2)', 'Dog', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 768, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, 'keyword prompt', 'keyword1, keyword2', 'None', 'textual inversion first', True, 'openpose', 'control_sd15_openpose [fef5e48e]', 1, None, False, 'Scale to Fit (Inner Fit)', False, False, False, 3, 0, False, False, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0, None, True, None, None, False, 10.0, True, 30.0, True, 0.0, 'Lanczos', 1) {} Traceback (most recent call last): File "/Users/philbuck/sd/modules/call_queue.py", line 56, in f res = list(func(*args, **kwargs)) File "/Users/philbuck/sd/modules/call_queue.py", line 37, in f res = func(*args, **kwargs) File "/Users/philbuck/sd/modules/txt2img.py", line 56, in txt2img processed = process_images(p) File "/Users/philbuck/sd/modules/processing.py", line 486, in process_images res = process_images_inner(p) File "/Users/philbuck/sd/modules/processing.py", line 628, in process_images_inner samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts) File "/Users/philbuck/sd/modules/processing.py", line 828, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File "/Users/philbuck/sd/modules/sd_samplers_kdiffusion.py", line 323, in sample samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={ File "/Users/philbuck/sd/modules/sd_samplers_kdiffusion.py", line 221, in launch_sampling return func() File "/Users/philbuck/sd/modules/sd_samplers_kdiffusion.py", line 323, in samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={ File "/opt/homebrew/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/Users/philbuck/sd/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral denoised = model(x, sigmas[i] * s_in, **extra_args) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/philbuck/sd/modules/sd_samplers_kdiffusion.py", line 116, in forward x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]}) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/philbuck/sd/repositories/k-diffusion/k_diffusion/external.py", line 114, in forward eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs) File "/Users/philbuck/sd/repositories/k-diffusion/k_diffusion/external.py", line 140, in get_eps return self.inner_model.apply_model(*args, **kwargs) File "/Users/philbuck/sd/modules/sd_hijack_utils.py", line 17, in setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs)) File "/Users/philbuck/sd/modules/sd_hijack_utils.py", line 28, in call return self.__orig_func(*args, **kwargs) File "/Users/philbuck/sd/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model x_recon = self.model(x_noisy, t, **cond) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/philbuck/sd/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward out = self.diffusion_model(x, t, context=cc) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/cldm.py", line 107, in forward2 return forward(*args, **kwargs) File "/Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/cldm.py", line 72, in forward control = outer.control_model(x=x, hint=outer.hint_cond, timesteps=timesteps, context=context) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/cldm.py", line 381, in forward guided_hint = self.input_hint_block(hint, emb, context) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/philbuck/sd/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 86, in forward x = layer(x) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/philbuck/sd/extensions-builtin/Lora/lora.py", line 182, in lora_Conv2d_forward return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input)) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward return F.conv2d(input, weight, bias, self.stride, TypeError: conv2d() received an invalid combination of arguments - got (NoneType, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:

  • (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups) didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)
  • (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups) didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)

Philbuck84 avatar Feb 15 '23 17:02 Philbuck84

Hmm. I wonder if we might be running different versions of pytorch. Are you using the mac-specific build discussed here? I recommend using it even though it doesn't resolve this issue, since it provides a ~25% speed boost on MPS.

jwooldridge234 avatar Feb 15 '23 18:02 jwooldridge234

I was not running that mac-specific build. Thanks for tipping me to that resource. I'm installing now and will try ControlNet again to see if get the same error.

Philbuck84 avatar Feb 15 '23 18:02 Philbuck84

Unfortunately, still running into the same errors even after updating to the mac-specific build. 🤷‍♂️

Philbuck84 avatar Feb 15 '23 18:02 Philbuck84

Hmm. When you launch, do you use webui.sh?

jwooldridge234 avatar Feb 15 '23 18:02 jwooldridge234

Yes

Philbuck84 avatar Feb 15 '23 18:02 Philbuck84

And you've pulled the latest Automatic update & sd-webui-controlnet update, correct? Just trying to figure out what could be different in our setup.

jwooldridge234 avatar Feb 15 '23 18:02 jwooldridge234

Wow, ok I did need to update ControlNet and it's working with --no-half. Thank you!

Philbuck84 avatar Feb 15 '23 18:02 Philbuck84

No worries! Glad it helped. Hopefully we can get a solution that allows us to use float16

jwooldridge234 avatar Feb 15 '23 19:02 jwooldridge234

With the command line arg --opt-sub-quad-attention and --no-half it runs about twice as fast for me (7-8s/it vs 20s/it). Still terrible but a bit better.

jwooldridge234 avatar Feb 15 '23 21:02 jwooldridge234

Yes, float16 doesn’t work correctly with MPS on this extension yet. I will try to fix that but I can’t make any guarantees at this point. First though I want to fix the normal and depth map preprocessors returning bad/inconsistent results on MPS.

brkirch avatar Feb 15 '23 22:02 brkirch

Sounds good. I'll take a look after work and see if I can debug.

jwooldridge234 avatar Feb 15 '23 22:02 jwooldridge234

Screenshot 2023-02-15 at 11 40 34 PM I have to assume that v21 is now required to use ControlNet. Whenever I use v15 models, it crashes.

mylife4aiur5 avatar Feb 16 '23 05:02 mylife4aiur5

I'm also running a Mac Studio M1 Ultra Running Ventura 13.1 and have got the latest versions of everything - Automatic1111, Controlnet, Python, et al. Have applied the suggested fixes that have worked for Philbuck84 as per above but Python still keeps crashing on trying to render using Controlnet every time, with the exact same error warning resulting in the message ending in "There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '"

David-Gianotti avatar Feb 16 '23 09:02 David-Gianotti

I'm also running a Mac Studio M1 Ultra Running Ventura 13.1 and have got the latest versions of everything - Automatic1111, Controlnet, Python, et al. Have applied the suggested fixes that have worked for Philbuck84 as per above but Python still keeps crashing on trying to render using Controlnet every time, with the exact same error warning resulting in the message ending in "There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '"

I can confirm now that it is working for me after adding --no-half to the web-user.sh file, COMMANDLINE_ARGS line.

David-Gianotti avatar Feb 17 '23 02:02 David-Gianotti

For unknown reason, from today, I kept getting the crashing again. It was working find for many weeks and suddenlty....

Loading preprocessor: none 0%| | 0/20 00:00<?, ?it/s: /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:39:0: error: 'mps.matmul' op contracting dimensions differ 1024 & 768 (mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:39:0: note: see current operation: %3 = "mps.matmul"(%arg0, %2) {transpose_lhs = false, transpose_rhs = false} : (tensor<1x77x1024xf32>, tensor<768x320xf32>) -> tensor<1x77x320xf32> zsh: segmentation fault ./webui.sh jimmygunawan@192-168-1-100 stable-diffusion-webui % /opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

enzyme69 avatar Mar 27 '23 07:03 enzyme69

If I use the diff_openpose model, it's working. But the more recent one keeps crashing my webUI automatic1111.

enzyme69 avatar Mar 27 '23 07:03 enzyme69