imaginAIry icon indicating copy to clipboard operation
imaginAIry copied to clipboard

poor performance on M2 8/16 gb

Open josaatt opened this issue 2 years ago • 41 comments

I have a macbook air M2 and can't get correctly rendered images. I'm using version 8.05 and pytorch 2.0. The command output looks fine but another problem is that most of the images are detected as NSFW.

imagine "a forest" 🤖🧠 imaginAIry received 1 prompt(s) and will repeat them 1 times to create 1 images. Generating 🖼 1/1: "a forest" 512x512px negative-prompt:"Ugly, duplication, duplicates, mutilation, deformed, mutilated, mutation, twisted body, disfigured, bad anatomy, out of frame, extra fingers, mutated hands, poorly drawn hands, extra limbs, malformed limbs, missing arms, extra arms, missing legs, extra legs, mutated hands, extra hands, fused fingers, missing fingers, extra fingers, long neck, small head, closed eyes, rolling eyes, weird eyes, smudged face, blurred face, poorly drawn face, mutation, mutilation, cloned face, strange mouth, grainy, blurred, blurry, writing, calligraphy, signature, text, watermark, bad art," seed:824252437 prompt-strength:7.5 steps:15 sampler-type:k_dpmpp_2m Loading model /Users/X/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/889b629140e71758e1e0006e355c331a5744b4bf/v1-5-pruned-emaonly.ckpt onto mps:0 backend... 100%|█████████████████████| 15/15 [03:45<00:00, 15.05s/it] Image Generated. Timings: conditioning:0.46s sampling:225.91s safety-filter:5.06s total:236.52s 🖼 [generated] saved to: ./outputs/generated/000006_824252437_kdpmpp2m15_PS7.5_a_forest_[generated].jpg

000006_824252437_kdpmpp2m15_PS7 5_a_forest_ generated

josaatt avatar Jan 26 '23 12:01 josaatt

Hi, I also have a macbook air M2 and didn't see any issues like you. I used the recommended install into a venv and at least my torch version is reported as 1.13.1. My python is 3.10.9 (I had problems with 3.11).

iandol avatar Jan 27 '23 03:01 iandol

@iandol thanks for the additional context.

@josaatt I just released version 8.2.0 which includes the aimg system-info command to help with debugging. Please run it and provide the output here.

Also I notice that the images took much longer than they should on an M2 - 15s/iteration. On my M1 I'm getting roughly an iteration per second.

Closing until we get more info.

brycedrennan avatar Jan 27 '23 05:01 brycedrennan

Here is the info for my working system for comparison:

▶︎ aimg system-info
imaginairy_version:            8.2.0
imaginairy_path:               /Users/ian/.venv/imagine/lib/python3.10/site-packages/imaginairy
python_version:                3.10.9
python_installation_path:      /Users/ian/.venv/imagine/bin/python3.10
device:                        mps:0
torch_version:                 1.13.1
platform:                      darwin
hardware_description:          macOS-13.2-arm64-arm-64bit

000002_909458166_kdpmpp2m15_PS7 5_a_forest_ generated

▶︎ imagine "a forest"
received 1 prompt(s) and will repeat them 1 times to create 1 images.
Generating 🖼  1/1: "a forest" 512x512px negative-prompt:"Ugly, duplication, duplicates, mutilation, deformed, mutilated, mutation, twisted body, disfigured, bad anatomy, out of frame, extra fingers, mutated hands, poorly drawn hands, extra limbs, malformed limbs, missing arms, extra arms, missing legs, extra legs, mutated hands, extra hands, fused fingers, missing fingers, extra fingers, long neck, small head, closed eyes, rolling eyes, weird eyes, smudged face, blurred face, poorly drawn face, mutation, mutilation, cloned face, strange mouth, grainy, blurred, blurry, writing, calligraphy, signature, text, watermark, bad art," seed:909458166 prompt-strength:7.5 steps:15 sampler-type:k_dpmpp_2m
Loading model /Users/ian/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/889b629140e71758e1e0006e355c331a5744b4bf/v1-5-pruned-emaonly.ckpt onto mps:0 backend...
100%|██████████████████████████████████████████████████████| 15/15 [02:23<00:00,  9.54s/it]
    Image Generated. Timings: conditioning:0.23s sampling:143.16s decoding:0.93s safety-filter:3.82s total:148.94s
    [generated] saved to: ./outputs/generated/000002_909458166_kdpmpp2m15_PS7.5_a_forest_[generated].jpg

Offtopic, but my performance @ 9.54s/it is closer to @josaatt than yours. My memory is 16GB and my GPU has 10 cores.

iandol avatar Jan 27 '23 14:01 iandol

leaving this open at least for the poor mps performance

brycedrennan avatar Jan 27 '23 17:01 brycedrennan

Made a clean install of python and their packages and now it renders correctly. Getting poor performance though on my 8 gig m2 with 10 gpu cores.

imagine "a forest" received 1 prompt(s) and will repeat them 1 times to create 1 images. Generating 🖼 1/1: "a forest" 512x512px negative-prompt:"Ugly, duplication, duplicates, mutilation, deformed, mutilated, mutation, twisted body, disfigured, bad anatomy, out of frame, extra fingers, mutated hands, poorly drawn hands, extra limbs, malformed limbs, missing arms, extra arms, missing legs, extra legs, mutated hands, extra hands, fused fingers, missing fingers, extra fingers, long neck, small head, closed eyes, rolling eyes, weird eyes, smudged face, blurred face, poorly drawn face, mutation, mutilation, cloned face, strange mouth, grainy, blurred, blurry, writing, calligraphy, signature, text, watermark, bad art," seed:335074117 prompt-strength:7.5 steps:15 sampler-type:k_dpmpp_2m Loading model /Users/X/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/889b629140e71758e1e0006e355c331a5744b4bf/v1-5-pruned-emaonly.ckpt onto mps:0 backend... 100%|█████████████████████| 15/15 [13:08<00:00, 52.54s/it] Image Generated. Timings: conditioning:0.44s sampling:788.22s decoding:2.36s safety-filter:5.64s total:796.96s [generated] saved to: ./outputs/generated/000001_335074117_kdpmpp2m15_PS7.5_a_forest_[generated].jpg

aimg system-info imaginairy_version: 8.2.0 imaginairy_path: /opt/homebrew/lib/python3.10/site-packages/imaginairy python_version: 3.10.9 python_installation_path: /opt/homebrew/opt/[email protected]/bin/python3.10 device: mps:0 torch_version: 1.13.1 platform: darwin hardware_description: macOS-13.2-arm64-arm-64bit

josaatt avatar Jan 27 '23 18:01 josaatt

Installed with an Anaconda environment (python 3.8.5, pytorch 1.12.1, torchvision 0.13.1, numpy 1.20.3) on an M2 mac mini 8gb, no issues. About 8s/it.

Astropulse avatar Jan 27 '23 19:01 Astropulse

I wonder if there is anything here that can help: https://huggingface.co/docs/diffusers/optimization/mps — I admit I am completely ignorant of the underlying tools and what is being used...

iandol avatar Jan 29 '23 13:01 iandol

@iandol thanks - I believe I have implemented the attention slicing for MPS but I think that's the first thing I should check to see if it's working properly.

brycedrennan avatar Jan 29 '23 15:01 brycedrennan

Just for some more context: problems with M1 from another SD python tool: https://github.com/invoke-ai/InvokeAI/issues/2428

iandol avatar Jan 30 '23 09:01 iandol

I tried invokeAI using the same SD1.5 model, same seed, same number of iterations for "a forest" as I posted above:

(stable-diffusion-1.5) invoke> a forest -S 824252437 -s 15
>> patchmatch.patch_match: INFO - Compiling and loading c extensions from "/Users/ian/.venv/invokeai/lib/python3.10/site-packages/patchmatch".
>> patchmatch.patch_match: ERROR - patchmatch failed to load or compile (Command 'make clean && make' returned non-zero exit status 2.).
>> patchmatch.patch_match: INFO - Refer to https://github.com/invoke-ai/InvokeAI/blob/main/docs/installation/INSTALL_PATCHMATCH.md for installation instructions.
>> Patchmatch not loaded (nonfatal)
Generating:   0%|                                                    | 0/1 [00:00<?, ?it/s]/Users/ian/.venv/invokeai/lib/python3.10/site-packages/diffusers/schedulers/scheduling_lms_discrete.py:268: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]
100%|██████████████████████████████████████████████████████| 15/15 [00:40<00:00,  2.72s/it]
Generating: 100%|████████████████████████████████████████████| 1/1 [00:42<00:00, 42.42s/it]
>> Usage stats:
>>   1 image(s) generated in 42.92s
Outputs:
[12] /Users/ian/Documents/invokeai/outputs/000002.824252437.png: a forest -s 15 -S 824252437 -W 512 -H 512 -C 7.5 -A k_lms

I got 2.72s/it compared to 9.54s/it from imaginAIry (43 secs vs. 149 secs). The initialisation shows the following:

* Initializing, be patient...
>> Initialization file /Users/ian/Documents/invokeai/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.0+a0
>> InvokeAI runtime directory is "/Users/ian/Documents/invokeai"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type mps
>> Current VRAM usage:  0.00G
>> Loading diffusers model from runwayml/stable-diffusion-v1-5
  | Using more accurate float32 precision
  | Loading diffusers VAE from stabilityai/sd-vae-ft-mse
  | Using more accurate float32 precision
Fetching 15 files: 100%|███████████████████████████████| 15/15 [00:00<00:00, 216946.76it/s]
  | Default image dimensions = 512 x 512
>> Model loaded in 14.78s
>> Textual inversions available: 
>> Setting Sampler to k_lms (LMSDiscreteScheduler)

The sampler is different and it is using diffusers, whatever impact that may have on performance...

iandol avatar Jan 31 '23 01:01 iandol

I believe Diffusers utilizes CoreML, that may explain the performance difference.

Astropulse avatar Jan 31 '23 04:01 Astropulse

AFAICT, InvokeAI is using the standard SD model not the one that has been converted to CoreML... So far only MochiDiffusion is using converted files: https://github.com/godly-devotion/MochiDiffusion/wiki/How-to-convert-ckpt-or-safetensors-files-to-Core-ML

iandol avatar Jan 31 '23 05:01 iandol

https://twitter.com/pcuenq/status/1620080393226706945

brycedrennan avatar Jan 31 '23 11:01 brycedrennan

Diffusers and Draw things generate images with 15 steps in about 45-75 seconds with SD model 1.5. Imaginairy and Amazing AI take several minutes.

josaatt avatar Jan 31 '23 22:01 josaatt

On my macbook air 16gb, using SD1.5 and 15 steps for the same image "a forest", the new Diffusers app takes ~13 seconds, 0.8s/it and the app uses only ~4GB at its peak!

I think trying to get imaginAIry to be more equivalent to other general python tools like invokeAI shouldn't require the work to support Apple's modified models, but obviously CoreML is an order of magnitude faster than the generic MPS interface. Personally I like imaginAIry's interface more than other tools.

iandol avatar Feb 01 '23 00:02 iandol

Draw Things supports both, here is the response from the developer on how he does this:

Thanks for the kind words! We support CoreML 1. models at 512×512 resolution and implemented something called swift-fickling to support pickle format checkpoint files. By reverse engineering CoreML's mimodelc format, we can generate proper mimodelc file on-fly from the pickle format checkpoint. Other resolutions and 2. models are supported through Metal, which is fast enough and quite memory efficient. We will implement 2. model support in CoreML in the future which should have 50% speedup over the current implementation!

iandol avatar Feb 06 '23 04:02 iandol

Just to add my stats here. Drawthings made an image in 10 seconds. Imaginairy 12. M1 32 GB.

Drawthings is a pretty cool app - I hadn't tried it till just now!

brycedrennan avatar Feb 06 '23 15:02 brycedrennan

@brycedrennan - so I think memory use must be the deciding factor as there is no way I can get 12 seconds from imaginAIry with a 16GB machine under any circumstances. What is your memory consumption during a run?

Indeed, DrawThings has a really great AI, and it juggles resources so well...

iandol avatar Feb 07 '23 15:02 iandol

I was looking at this last night. what specific version of invokeai did you use? They've made a bunch of changes recently.

brycedrennan avatar Feb 07 '23 15:02 brycedrennan

My experience with invoke and automatic last night was "wow these are hard to install and have lots of bugs"

brycedrennan avatar Feb 07 '23 15:02 brycedrennan

🤣 my experience with InvokeAI exactly!!! It forces a dumb location of the venv that breaks my shell integration, so so complicated. ImaginAIry is a breeze in comparison! Currently my Invoke venv is broken and I can't seem to reinstall but it was the version from last week...

iandol avatar Feb 07 '23 15:02 iandol

Another data point - 45s/it on M2, 8GB. System gets very unstable when running imaginAIry.

aimg system-info imaginairy_version: 9.0.2 imaginairy_path: /opt/homebrew/lib/python3.10/site-packages/imaginairy python_version: 3.10.10 python_installation_path: /opt/homebrew/opt/[email protected]/bin/python3.10 device: mps:0 torch_version: 1.13.0 platform: darwin hardware_description: macOS-13.1-arm64-arm-64bit

bluusun avatar Feb 12 '23 02:02 bluusun

I started looking at this but it's hard to investigate well without an actual 8gb machine. Got distracted with allowing imaginairy to output very large images (like 4k). Now I need to integrate https://github.com/lllyasviel/ControlNet

brycedrennan avatar Feb 12 '23 02:02 brycedrennan

Wow 4K!!!! What's ControlNet do? I'm testing the DB add-on this weekend.

bluusun avatar Feb 12 '23 02:02 bluusun

everything? :-) check out the link

brycedrennan avatar Feb 12 '23 02:02 brycedrennan

I'm dropping the 4k work to pursue controlnet integration ASAP

brycedrennan avatar Feb 12 '23 02:02 brycedrennan

My feeling with ImaginAIry is that the machine with 32 GB unified memory will perform better and faster. M1 and M2, maybe just "slight differences".

Using iMac M1 8GB, I remember needing 10+ minutes to produce an image.

Now with M2 16GB Mac Mini (just got in yesterday), I need only 3-4 minutes. I really should have upgraded to M2 32GB... hmm

I do get really good speed out of DrawThings and Automatic1111.

enzyme69 avatar Feb 14 '23 06:02 enzyme69

So to be clear, you're saying automatic is faster on all machines? or just the 8 gb ones?

brycedrennan avatar Feb 14 '23 06:02 brycedrennan

For imaginairy 16 GB machine is faster. 32 GB even better.

Automatic1111 is somewhat faster.

DrawThings using CoreML is really fast 3-4 generations per minute.

enzyme69 avatar Feb 14 '23 06:02 enzyme69

Tried a fresh install on another 8GB M1. 39s/it. System unstable when generating images.

bluusun avatar Feb 14 '23 16:02 bluusun