InvokeAI [bug] MacOs: diffusers model, Image 768x768, failed assertion NDArray

Is there an existing issue for this?

[X] I have searched the existing issues

OS

macOS

GPU

mps

VRAM

128GB

What happened?

@keturn, when I'm using a diffuser model (deliberate) I get a failed assertion when image is 768x768.

NOTE: Before the update to diffuser 0.12.1 and transformers=4.26.0 I was able to generate images of this size and larger.

I also tried to !covert_model again, but it fails too

my local repos is updated to c18db4e47b10cf1658612f3eec2d537a789b10ea and my .venv updated by python -mpip install -r requirements.txt

Screenshots

(midjourney) invoke> !switch d-deliberate

Current VRAM usage: 0.00G Offloading midjourney to CPU Loading diffusers model from /users/ivano/Junk/SD/diffusers/deliberate-v1.1 | Using more accurate float32 precision | Default image dimensions = 512 x 512 Model loaded in 1.07s Textual inversions available: Style-GlassFinal, Style-Princess Setting Sampler to k_lms (LMSDiscreteScheduler) (d-deliberate) invoke> a nice dog in the garden -H 768 -W 768 objc[26679]: Class CaptureDelegate is implemented in both /Users/ivano/Code/Ai/invokeai/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x13bc76480) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_videoio.4.7.0.dylib (0x369c78880). One of the two will be used. Which one is undefined. objc[26679]: Class CVWindow is implemented in both /Users/ivano/Code/Ai/invokeai/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x13bc764d0) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x31f6e0b10). One of the two will be used. Which one is undefined. objc[26679]: Class CVView is implemented in both /Users/ivano/Code/Ai/invokeai/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x13bc764f8) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x31f6e0b38). One of the two will be used. Which one is undefined. objc[26679]: Class CVSlider is implemented in both /Users/ivano/Code/Ai/invokeai/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x13bc76520) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x31f6e0b60). One of the two will be used. Which one is undefined. Patchmatch initialized Generating: 0%| | 0/1 [00:00<?, ?it/s]/Users/ivano/Code/Ai/invokeai/.venv/lib/python3.10/site-packages/diffusers/schedulers/scheduling_lms_discrete.py:268: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.) step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps] /AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:724: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32' | 0/50 [00:00<?, ?it/s] zsh: abort python ./scripts/invoke.py /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Additional context

Only happen with diffuser model, when I'm using a ckpt model, then I can generate image up to 960x960

Contact Details

No response

Jan 28 '23 02:01 i3oc9i

I think this is the same error that usually happens on square images when the dimensions are a power of 2. 768 is not a power of 2, though. It's 3 * 2^8.

Jan 28 '23 02:01 whosawhatsis

@whosawhatsis

I think this is the same error that usually happens on square images when the dimensions are a power of 2. 768 is not a power of 2, though. It's 3 * 2^8.

It was working before I updated my local repos with last commit and pip install, also it is not related with square images, 768x704 is failing too.. as well 768x832, 768x960, 640x832

All those dimensions were working with the 0.11 version of the diffuser, see next comment

Jan 28 '23 03:01 i3oc9i

I reverted the .venv to diffuser==0.11 and transformers==4.25 on last commit, and 768x768, 960x960 works !

I believe thee is a regression with last version of diffuser/transformers on the MacOs environment

diff --git a/environments-and-requirements/requirements-base.txt b/environments-and-requirements/requirements-base.txt
index b7a3a2a7..0c791e1d 100644
--- a/environments-and-requirements/requirements-base.txt
+++ b/environments-and-requirements/requirements-base.txt
@@ -2,7 +2,7 @@
 accelerate
 albumentations
 datasets
-diffusers[torch]~=0.12
+diffusers[torch]==0.11
 dnspython==2.2.1
 einops
 eventlet
@@ -37,7 +37,7 @@ taming-transformers-rom1504
 test-tube>=0.7.5
 torch-fidelity
 torchmetrics
-transformers~=4.26
+transformers==4.25
 windows-curses; sys_platform == 'win32'
 https://github.com/Birch-san/k-diffusion/archive/refs/heads/mps.zip#egg=k-diffusion
 https://github.com/invoke-ai/PyPatchMatch/archive/refs/tags/0.1.5.zip#egg=pypatchmatch

Jan 28 '23 03:01 i3oc9i

@keturn This continues to be a problem on MacOS MPS systems. Is this a known issue with diffusers 0.12.1?

Feb 04 '23 18:02 lstein

Not as far as I know. I don't find any reports of "total bytes of NDArray" in the upstream bug tracker.

Feb 04 '23 19:02 keturn

the AttnProcessor stuff was only added in 0.12, so downgrading isn't an option without breaking .swap()

Feb 05 '23 20:02 damian0815

@keturn i found this, not sure if it's relevant though https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/5796

Feb 06 '23 20:02 damian0815

I copy here this comment form InvokeAI Version 2.3.0 discussion in order to preserv relevant information about this issue It is not in purpose to pinning anyone (reference https://github.com/invoke-ai/InvokeAI/discussions/2482#discussioncomment-4887572)

here is a log that expose what I'm observig

I have a local branch where I have downgraded the transformer and the diffuser how you can see in the git show. Here is the pip list of my environment venv-pip-list-patched.txt

>>> git show
commit 4379b444104620cf0bca212ae5e518d98df0a9ea (HEAD -> my-fixes, tag: good)
Author: Ivano Coltellacci <[email protected]>
Date:   Sat Jan 28 12:39:43 2023 +0100

    fix: rollback to previous diffuseur version

diff --git a/environments-and-requirements/requirements-base.txt b/environments-and-requirements/requirements-base.txt
index b7a3a2a7..0c791e1d 100644
--- a/environments-and-requirements/requirements-base.txt
+++ b/environments-and-requirements/requirements-base.txt
@@ -2,7 +2,7 @@
 accelerate
 albumentations
 datasets
-diffusers[torch]~=0.12
+diffusers[torch]==0.11
 dnspython==2.2.1
 einops
 eventlet
@@ -37,7 +37,7 @@ taming-transformers-rom1504
 test-tube>=0.7.5
 torch-fidelity
 torchmetrics
-transformers~=4.26
+transformers==4.25
 windows-curses; sys_platform == 'win32'
 https://github.com/Birch-san/k-diffusion/archive/refs/heads/mps.zip#egg=k-diffusion
 https://github.com/invoke-ai/PyPatchMatch/archive/refs/tags/0.1.5.zip#egg=pypatchmatch

In the following I optimze the analog.ckpt model as diffuser model, than I generate a 960x960 image

>>> python ./scripts/invoke.py
* Initializing, be patient...
>> Initialization file /Users/ivano/Code/Ai/invokeai.stable/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.0+a0
>> InvokeAI runtime directory is "/Users/ivano/Code/Ai/invokeai.stable"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type mps
>> Current VRAM usage:  0.00G
>> Loading midjourney from /Users/ivano/Junk/SD/midjourney-v4.ckpt
>> Scanning Model: midjourney
>> Model scanned ok!
>> Loading midjourney from /Users/ivano/Junk/SD/midjourney-v4.ckpt
   | Forcing garbage collection prior to loading new model
   | LatentDiffusion: Running in eps-prediction mode
   | DiffusionWrapper has 859.52 M params.
   | Making attention of type 'vanilla' with 512 in_channels
   | Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
   | Making attention of type 'vanilla' with 512 in_channels
   | Using more accurate float32 precision
   | Loading VAE weights from: /Users/ivano/Code/Ai/invokeai.stable/models/ldm/stable-diffusion-v1/vae-ft-mse-840000-ema-pruned.ckpt
>> Model loaded in 4.94s
>> Model loaded in 5.03s
>> Textual inversions available:  Style-Princess
>> Setting Sampler to k_lms

* Initialization done! Awaiting your command (-h for help, 'q' to quit)
(midjourney) invoke> !models
analog                    not loaded  ckpt       Analog Diffusion v1 [photo]
clarity                   not loaded  ckpt       Clarity [photo]
kalista                   not loaded  ckpt       Kalista [general]
midjourney                    active  ckpt       Midjourney v4 [general]
sd-15                     not loaded  ckpt       Stable Diffusion version 1.5
sd-inpaint-15             not loaded  ckpt       Stable Diffusion version 1.5 (inpainting)
vintedois                 not loaded  ckpt       Vintedois v0.1 (estilovintedois)  [general]
(midjourney) invoke> 
(midjourney) invoke> !optimize analog
>> Optimizing analog (30-60s)
global_step key not found in model
>> Success. Optimized model is now located at /Users/ivano/Code/Ai/invokeai.stable/models/converted-ckpts/analog
>> Writing new config file entry for analog
>> vae-ft-mse-840000-ema-pruned VAE corresponds to known stabilityai/sd-vae-ft-mse diffusers version
>> Conversion succeeded
Load optimized model analog? [y]
>> Current VRAM usage:  0.00G
>> Offloading midjourney to CPU
>> Loading diffusers model from /Users/ivano/Code/Ai/invokeai.stable/models/converted-ckpts/analog
  | Using more accurate float32 precision
  | Loading diffusers VAE from stabilityai/sd-vae-ft-mse
  | Using more accurate float32 precision
Downloading (…)_model.safetensors";: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 335M/335M [00:10<00:00, 30.7MB/s]
Downloading (…)lve/main/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 547/547 [00:00<00:00, 458kB/s]
  | Calculating sha256 hash of model files
  | sha256 = 7057230c8aaf0b50183cd43d7260f696f8c8b0524d2cff3d02cf24004fb57080 (15 files hashed in 7.54s)
  | Default image dimensions = 512 x 512
>> Model loaded in 21.53s
>> Textual inversions available:  Style-Princess
>> Setting Sampler to k_lms (LMSDiscreteScheduler)
Delete the original .ckpt file at (/Users/ivano/Junk/SD/analog-v1.safetensors ? [n]
(analog) invoke> an happy dog in a garden -H 960 -W 960
objc[7184]: Class CaptureDelegate is implemented in both /Users/ivano/Code/Ai/invokeai.stable/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x15527a4d0) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_videoio.4.7.0.dylib (0x30b6bc880). One of the two will be used. Which one is undefined.
objc[7184]: Class CVWindow is implemented in both /Users/ivano/Code/Ai/invokeai.stable/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x15527a520) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x2efe4cb10). One of the two will be used. Which one is undefined.
objc[7184]: Class CVView is implemented in both /Users/ivano/Code/Ai/invokeai.stable/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x15527a548) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x2efe4cb38). One of the two will be used. Which one is undefined.
objc[7184]: Class CVSlider is implemented in both /Users/ivano/Code/Ai/invokeai.stable/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x15527a570) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x2efe4cb60). One of the two will be used. Which one is undefined.
>> Patchmatch initialized
Generating:   0%|                                                                                                                                                                                                                                                                                                                          | 0/1 [00:00<?, ?it/s]/Users/ivano/Code/Ai/invokeai.stable/.venv/lib/python3.10/site-packages/diffusers/schedulers/scheduling_lms_discrete.py:268: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [02:12<00:00,  2.66s/it]
Generating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [02:13<00:00, 134.00s/it]
>> Usage stats:
>>   1 image(s) generated in 134.24s
Outputs:
[68] /Users/ivano/Code/Ai/@Stuffs/images/@invokeai/001244.1087789743.png: an happy dog in a garden -s 50 -S 1087789743 -W 960 -H 960 -C 7.5 -A k_lms

(analog) invoke>

I guess this demostrate that 0.11 version of the diffuser was working with size greater than 768x768

Feb 09 '23 09:02 i3oc9i

NOTE SD2.1 breaks at 832x832

Feb 09 '23 13:02 i3oc9i

I have updated my venv to the diffuser 0.13 and the issue is confirmed also with this new version

Feb 19 '23 10:02 i3oc9i

I have updated my venv to the diffuser 0.13.1 and the issue is confirmed also with this new version

Feb 20 '23 10:02 i3oc9i

The issue is still being experienced with the release 2.3.1

Feb 27 '23 10:02 i3oc9i

i note you're using deliberate and midjourney. do you get the same problem with the base SD 1.5 or 2.1 diffusers models?

Feb 27 '23 13:02 damian0815

Yes, in the log I switched to the stock sd-15 diffuser model, and I request a 832x704 (a non square image)..

Please note that with version 0.11 of diffuser I was able to create images up to 960x960 pixel. IMHO something was broken with 0.12 version of de diffuseurs

* Initializing, be patient...
>> Initialization file /Users/ivano/Code/Ai/@Stuffs/invokeai.models/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.1
>> InvokeAI runtime directory is "/Users/ivano/Code/Ai/@Stuffs/invokeai.models"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type mps
>> xformers not installed
>> NSFW checker is disabled
>> Current VRAM usage:  0.00G
>> Loading midjourney from /Users/ivano/junk/sd/midjourney-v4.ckpt
>> Scanning Model: midjourney
>> Model scanned ok
>> Loading midjourney from /Users/ivano/junk/sd/midjourney-v4.ckpt
   | Forcing garbage collection prior to loading new model
   | LatentDiffusion: Running in eps-prediction mode
   | DiffusionWrapper has 859.52 M params.
   | Making attention of type 'vanilla' with 512 in_channels
   | Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
   | Making attention of type 'vanilla' with 512 in_channels
   | Using more accurate float32 precision
   | Loading VAE weights from: /Users/ivano/Code/Ai/@Stuffs/invokeai.models/models/ldm/stable-diffusion-v1/vae-ft-mse-840000-ema-pruned.ckpt
>> Model loaded in 4.99s
>> Loading embeddings from /Users/ivano/Code/Ai/@Stuffs/invokeai.models/embeddings
>> Textual inversion triggers:
>> Setting Sampler to k_lms

* --web was specified, starting web server...
* Initializing, be patient...
>> Initialization file /Users/ivano/Code/Ai/@Stuffs/invokeai.models/invokeai.init found. Loading...
>> Started Invoke AI Web Server!
>> Default host address now 127.0.0.1 (localhost). Use --host 0.0.0.0 to bind any address.
>> Point your browser at http://127.0.0.1:9090
>> System config requested
objc[1693]: Class CaptureDelegate is implemented in both /Users/ivano/Code/Ai/@Stuffs/invokeai.models/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x1795824e8) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_videoio.4.7.0.dylib (0x2bd868880). One of the two will be used. Which one is undefined.
objc[1693]: Class CVWindow is implemented in both /Users/ivano/Code/Ai/@Stuffs/invokeai.models/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x179582538) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x2bd3f0b10). One of the two will be used. Which one is undefined.
objc[1693]: Class CVView is implemented in both /Users/ivano/Code/Ai/@Stuffs/invokeai.models/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x179582560) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x2bd3f0b38). One of the two will be used. Which one is undefined.
objc[1693]: Class CVSlider is implemented in both /Users/ivano/Code/Ai/@Stuffs/invokeai.models/.venv/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x179582588) and /opt/homebrew/Cellar/opencv/4.7.0_1/lib/libopencv_highgui.4.7.0.dylib (0x2bd3f0b60). One of the two will be used. Which one is undefined.
>> Patchmatch initialized
>> Model change requested: sd-15
>> Current VRAM usage:  0.00G
>> Offloading midjourney to CPU
>> Loading diffusers model from runwayml/stable-diffusion-v1-5
  | Using more accurate float32 precision
  | Loading diffusers VAE from stabilityai/sd-vae-ft-mse
  | Using more accurate float32 precision
Fetching 15 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 76818.75it/s]
  | Default image dimensions = 512 x 512
>> Model loaded in 8.70s
>> Loading embeddings from /Users/ivano/Code/Ai/@Stuffs/invokeai.models/embeddings
>> Textual inversion triggers:
>> Setting Sampler to k_lms (LMSDiscreteScheduler)

>> Image Generation Parameters:

{'prompt': 'an happy dog into a nice garden [((blurry)), duplicate, deformed, cartoon, animated, render\n]', 'iterations': 1, 'steps': 30, 'cfg_scale': 7.5, 'threshold': 0, 'perlin': 0, 'height': 832, 'width': 704, 'sampler_name': 'k_euler_a', 'seed': 715738625, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '...', 'hires_fix': False, 'seamless': False, 'variation_amount': 0}

>> ESRGAN Parameters: False
>> Facetool Parameters: False
>> Setting Sampler to k_euler_a (EulerAncestralDiscreteScheduler)
Generating:   0%|                                                                                                                                                                                                                                                                                                                          | 0/1 [00:00<?, ?it/s]/Users/ivano/Code/Ai/@Stuffs/invokeai.models/.venv/lib/python3.10/site-packages/diffusers/schedulers/scheduling_euler_ancestral_discrete.py:299: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]
                                                                                                                                                                                                                                                                                                                                                                /AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:724: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'                                                                      | 0/30 [00:00<?, ?it/s]
zsh: abort      invokeai --web
/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Feb 27 '23 17:02 i3oc9i

I'm keeping a try with the ongoing Invoke 3.0.0 and the issue is confirmed (commit 86e2cb04)

FYI, I have run

pip install --upgrade --upgrade-strategy eager --use-pep517 -e .

so diffusers version is updated to 0.14.0

Mar 04 '23 15:03 i3oc9i

Just to understand this happens just on MacOS right?

Does anybody have a reproducible code snippet that includes just diffusers by any chance? This would help a lot to quickly figure out what going on I think :-)

Also cc @pcuenca

Mar 06 '23 10:03 patrickvonplaten

yes, I'm a mac user, keep in mind that things were working with diffuser 0.11 as explained in the comment https://github.com/invoke-ai/InvokeAI/issues/2444#issuecomment-1423869116

I get you can reproduce using the commit of 28th January

Mar 06 '23 17:03 i3oc9i

Hey @i3oc9i,

I sadly won't have the time to dive deeper into InvokeAI here could you maybe try to reproduce your issue just using diffusers code? E.g.:

from diffusers import StableDiffusionPipeline

pipe = ....

...

Mar 06 '23 18:03 patrickvonplaten

@patrickvonplaten, programming in python is quite far from my knowledge, but I will give a try later....

Anyway, IMHO this is not relevant, indeed if you look at the comments in the issue, at that time (28th January), I demonstrate that using 0.11 instead of 0.12.1 on the same InvokeAi base code, the issue was not firing.

So my assumption there is some kind of code regression between 0.11 and 0.12.1 for MacOS machines

Mar 06 '23 19:03 i3oc9i

@i3oc9i The repro snippet would allow us to easily test on 0.11 and 0.12.1. I tried to reproduce but couldn't; what I did was run a stable diffusion pipeline and generate a single image at 768x768 - it worked. So there must be something else in your configuration that is triggering the problem. It could be additional images being generated in a batch, an image to image task, a different base model, or something else.

In any case, the NDArray issue is hopefully being resolved in the upcoming PyTorch 2.0 release. Would it be possible for you to test using the nightly (preview) version of PyTorch: https://pytorch.org/get-started/locally/? Thanks a lot!

For reference, this was my test script, it worked on the latest main:

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch

model_id = "lambdalabs/dreambooth-avatar"
device = "mps"
seed = 1337

scheduler = DPMSolverMultistepScheduler.from_pretrained(model_id, subfolder="scheduler")
torch.manual_seed(seed)
sdm = StableDiffusionPipeline.from_pretrained(
    model_id,
    scheduler=scheduler,
    safety_checker=None,
)
sdm = sdm.to(device)

prompt = "Yoda, avatarart style"
images = sdm(prompt, width=768, height=768, num_inference_steps=20).images
for i, image in enumerate(images):
    image.save(f"yoda_768_{device}_{seed}_{i}.png")

(This was tested on both Ventura 13.2 and 13.3 beta)

Mar 07 '23 07:03 pcuenca

@pcuenca thank you very much for your help with the snippet, I will give it a try this evening when I come back from work

Mar 07 '23 11:03 i3oc9i

@pcuenca @patrickvonplaten

I executed your snippet in a dedicated python 3.10.10 venv (pip-list.txt) using respectively diffusers 0.11 and 0.14 with transformers 0.45 and 0.46.1 based on thorch 1.13.1 , and I get the same result for the two cases

768x768 = OK
768x832 = OK
768x896 = FAIL
832x832 = FAIL

>>> So we cannot reproduce with this snippet.

After I have upgraded my python 3.10.10 venv (pip-list-thorch-2.1.0.txt) with thorch 2.1.0.dev20230307 keeping diffusers==0.14 and transformers==0.46.1

pip3 install --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cpu

but I get the same result

768x768 = OK
768x832 = OK
768x896 = FAIL
832x832 = FAIL

>>> So last PyTorch 2.1 release does not solve the NDArray issue, at least in this case.

(Using Mac Studio Ultra 128G Ram and MacOs Ventura 13.2.1)

Mar 07 '23 23:03 i3oc9i

@pcuenca @patrickvonplaten

Anyway I also recreated an InvokeAi environment (pip-list.txt) using the commit 01866305140aaba07ecfd590477f2ae368e7a91f, and I was able to obtain an image 960x960, downgrading the version of diffusers and of transformers

diff --git a/environments-and-requirements/requirements-base.txt b/environments-and-requirements/requirements-base.txt
index b7a3a2a7..0c791e1d 100644
--- a/environments-and-requirements/requirements-base.txt
+++ b/environments-and-requirements/requirements-base.txt
@@ -2,7 +2,7 @@
 accelerate
 albumentations
 datasets
-diffusers[torch]~=0.12
+diffusers[torch]==0.11
 dnspython==2.2.1
 einops
 eventlet
@@ -37,7 +37,7 @@ taming-transformers-rom1504
 test-tube>=0.7.5
 torch-fidelity
 torchmetrics
-transformers~=4.26
+transformers==4.25
 windows-curses; sys_platform == 'win32'
 https://github.com/Birch-san/k-diffusion/archive/refs/heads/mps.zip#egg=k-diffusion
 https://github.com/invoke-ai/PyPatchMatch/archive/refs/tags/0.1.5.zip#egg=pypatchmatch

I'm available to do any other tests you think useful to solve this issue.

Mar 07 '23 23:03 i3oc9i

Regarding latest release (2.3.2) av the mentioned bug fix:

Upgraded to latest versions of diffusers, transformers, safetensors and accelerate libraries upstream. We hope that this will fix the assertion NDArray > 2**32 issue that MacOS users have had when generating images larger than 768x768 pixels. Please report back.

Unfortunately, it did not work. At least not for me. (Apple MacBook Pro 64gb running Ventura 13.2.1)

Mar 13 '23 08:03 pivot69

I saw a comment on the a1111 issues suggesting that (what appears to be the same issue) is a bug in how 13.2.x interacts with pytorch. Apparently both 13.1 and 13.3 are unaffected.

Mar 13 '23 08:03 whosawhatsis

latest release (2.3.2) does not solve this issue... @whosawhatsis can you please give us the reference to the comment you are talking about ?

Mar 13 '23 11:03 i3oc9i

Hmm, looking closer at the thread in question (which is kinda all over the place), I'm less confident that it's the same issue. Here's the comment I was remembering: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/7453#discussioncomment-5285469

Mar 13 '23 20:03 whosawhatsis

I have updated to last commit 27a113d8 and I have updated my venv using the following command

pip install --upgrade --upgrade-strategy eager --use-pep517 -e .

in order to upgrade to last version of torch, diffuser and transformers modules

pip list | grep -e diffuser  -e transformer -e torch
clip-anytorch           2.5.2
diffusers               0.14.0
pytorch-lightning       1.7.7
torch                   2.0.0
torchmetrics            0.11.4
torchvision             0.15.1
transformers            4.27.1

but the issue in not solved, invoking a 704x768 image with the stock SD-1.5 model fail.

invokeai --web
* Initializing, be patient...
>> Initialization file /Users/ivano/Code/Ai/@Stuffs/invokeai.models/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 3.0.0+a0
>> InvokeAI runtime directory is "/Users/ivano/Code/Ai/@Stuffs/invokeai.models"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type mps
>> xformers not installed
>> NSFW checker is disabled
>> Current VRAM usage:  0.00G
>> Loading diffusers model from runwayml/stable-diffusion-v1-5
  | Using more accurate float32 precision
  | Loading diffusers VAE from stabilityai/sd-vae-ft-mse
  | Using more accurate float32 precision
Fetching 15 files: 100%| | 15/15 [00:00<00:00, 44119.61it/s]
  | Default image dimensions = 512 x 512
>> Loading embeddings from /Users/ivano/Code/Ai/@Stuffs/invokeai.models/embeddings
>> Textual inversion triggers: bad_prompt
>> Model loaded in 4.52s
>> Setting Sampler to k_lms (LMSDiscreteScheduler)

* --web was specified, starting web server...
Loading Python libraries...

* Initializing, be patient...
>> Initialization file /Users/ivano/Code/Ai/@Stuffs/invokeai.models/invokeai.init found. Loading...
>> Started Invoke AI Web Server
>> Default host address now 127.0.0.1 (localhost). Use --host 0.0.0.0 to bind any address.
>> Point your browser at http://127.0.0.1:9090
>> System config requested
>> Patchmatch initialized

>> Image Generation Parameters:

{'prompt': 'am happy dog in a nice garden', 'iterations': 1, 'steps': 30, 'cfg_scale': 7.5, 'threshold': 0, 'perlin': 0, 'height': 768, 'width': 704, 'sampler_name': 'k_euler_a', 'seed': 226339246, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '...', 'hires_fix': False, 'seamless': False, 'variation_amount': 0}

>> ESRGAN Parameters: False
>> Facetool Parameters: False
>> Setting Sampler to k_euler_a (EulerAncestralDiscreteScheduler)
Generating:   0%|
AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:724: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'
zsh: abort      invokeai --web
/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown 
warnings.warn('resource_tracker: There appear to be %d '

Mar 18 '23 08:03 i3oc9i

@keturn @lstein @pcuenca **Good News !!! ** This issue is solved by upgrading to Ventura 13.3 and pyTorch 2.0

I'm rolling InvokeAI 3.0.0+a0 on commit 09dfde0b

pip list | grep -e diffuser  -e transformer -e torch
clip-anytorch           2.5.2
diffusers               0.14.0
pytorch-lightning       1.7.7
torch                   2.0.0
torchmetrics            0.11.4
torchvision             0.15.1
transformers            4.27.3

I was able to generate image up to 1088x1088, after I get still Error: total bytes of NDArray > 2**32'

I will propose to close after other macuser confirm

Mar 27 '23 21:03 i3oc9i

@keturn @lstein @pcuenca **Good News !!! ** This issue is solved by upgrading to Ventura 13.3 and pyTorch 2.0

I'm rolling InvokeAI 3.0.0+a0 on commit 09dfde0
pip list | grep -e diffuser  -e transformer -e torch
clip-anytorch           2.5.2
diffusers               0.14.0
pytorch-lightning       1.7.7
torch                   2.0.0
torchmetrics            0.11.4
torchvision             0.15.1
transformers            4.27.3
I was able to generate image up to 1088x1088, after I get still Error: total bytes of NDArray > 2**32'

I will propose to close after other macuser confirm

Hoorah! Upgrade to Ventura 13.3 did the trick! Seems to be working with invokeai 2.3.2 also, even without newest pytorch.

pip list | grep -e diffuser  -e transformer -e torch

clip-anytorch               2.5.2
diffusers                   0.14.0
pytorch-lightning           1.7.7
taming-transformers-rom1504 0.0.6
torch                       1.13.1
torch-fidelity              0.3.0
torchdiffeq                 0.2.3
torchmetrics                0.11.4
torchsde                    0.2.5
torchvision                 0.14.1
transformers                4.26.1

Its chugging down memory though, and at 1088x1088 it quickly consumed all 64gb of ram, slowing my mac and generation speed dropped to almost a standstill. It's generating at 248s per iteration ;-) (but holding!)

I had no issues with generating image at 832x832 though, something that was not possible with the previous version of Ventura. I'll keep trying to see what max dimensions will be but will wait until Im to ndeending on my mac for work.

UPDATE 1: 1088x1088 generated in 4317.85s UPDATE 2: Everything above 1088x1088 fails, confirming what @i3oc9i experienced

Mar 29 '23 09:03 pivot69

InvokeAI
InvokeAI copied to clipboard

[bug] MacOs: diffusers model, Image 768x768, failed assertion NDArray > 2**32

Is there an existing issue for this?

OS

GPU

VRAM

What happened?

Screenshots

Additional context

Contact Details

InvokeAI InvokeAI copied to clipboard

[bug] MacOs: diffusers model, Image 768x768, failed assertion NDArray > 2**32

Is there an existing issue for this?

OS

GPU

VRAM

What happened?

Screenshots

Additional context

Contact Details

InvokeAI
InvokeAI copied to clipboard