stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: M1 Mac textual inversion extremely slow

Open HamsterGerbil opened this issue 1 year ago • 11 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

First I have to say thank you AUTOMATIC1111 and devs for your incredible work. This is an incredible tool. Right now I'm trying to use the textual inversion and or hypernetwork training, but after creating the embedding and processing my images when I run the training process for both textual inversion and hyper networks each step takes approximately thirty seconds.

Any help would be greatly appreciated. I really love the webui and textual inversion is simply astonishing.

Steps to reproduce the problem

  1. Go to training on the webui.
  2. Create image imbedding and then go to preprocess images and process 512 x 512 images in a directory into a destination directory.
  3. Run training with embedding and processed directory. Default settings (learning rate 0.005, batch size 1, log directory textual_inversion, prompt template file /Users/jesse/Documents/stable-diffusion-webui/textual_inversion_templates/style_filewords.txt, save images with embedding in PNG chunks.)
  4. Click train embedding

What should have happened?

The imbedding should have trained at a rate of around 1.5-3.8s/it. This is the same speed that it usually runs for steps when creating an image in the webui from a prompt or when using a google colab for textual inversion training.

Commit where the problem happens

98947d1

What platforms do you use to access UI ?

MacOS

What browsers do you use to access the UI ?

Google Chrome, Apple Safari

Command Line Arguments

I modified Run_webui_mac.sh to fix an error where all embeddings would fail to load by disabling safe unpickle (I'm not running other people's embeddings so I felt it was somewhat safe). 

Here is the exact line:
python webui.py --disable-safe-unpickle --precision full --no-half --use-cpu Interrogate GFPGAN CodeFormer $@

Additional information, context and logs

Terminal window results: Loaded a total of 1 textual inversion embeddings. Embeddings: Don 100%|█████████████████████████████████████████████| 8/8 [00:01<00:00, 4.60it/s] Training at rate of 0.005 until step 100000 Preparing dataset... 100%|█████████████████████████████████████████████| 7/7 [00:01<00:00, 3.73it/s] [Epoch 0: 3/700]loss: 0.0034546: 0%| | 3/100000 [01:31<867:30:33, 31.23s/it]

HamsterGerbil avatar Nov 17 '22 17:11 HamsterGerbil

I'm getting similar performance in Dreambooth now. I think it's falling back to CPU?

marinohardin avatar Nov 17 '22 18:11 marinohardin

In my case, I was also getting ~30 s/it, and when I checked Activity Monitor, it was taking up ~30% CPU (on M1 Pro) and zero GPU. Can you check if that's happening for you too?

Do you get any errors or warnings in terminal about CUDA not found?

marinohardin avatar Nov 17 '22 18:11 marinohardin

I checked the GPU usage on activity monitor and I find that on average it uses about 50% GPU--which fluctuates between around 20% on the low end to around 70% at best. Meanwhile when I run standard image generation it uses about 80% GPU. Still new to coding so there might be something obvious I'm missing when I'm checking the monitor.

Screenshot 2022-11-17 at 10 49 22 AM

HamsterGerbil avatar Nov 17 '22 18:11 HamsterGerbil

I do get this warning about torch not being compiled with CUDA enabled.

Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled LatentDiffusion: Running in eps-prediction mode

Another warning I get is: WARNING: overwriting environment variables set in the machine overwriting variable {'PYTORCH_ENABLE_MPS_FALLBACK'}

HamsterGerbil avatar Nov 17 '22 20:11 HamsterGerbil

Do you check the memory usage? On my system, it's only acceptable on setting to 400x400 or below.

julianko13 avatar Dec 13 '22 22:12 julianko13

Same issue, super slow

yrik avatar Dec 28 '22 20:12 yrik

Super slow, usually 3 to 5 hours - now 29 hours!!!

PC, 64Gb memory, nVidia GPU 8Mb VRAM, SSD

Torcelllo avatar May 21 '23 13:05 Torcelllo

The issue is it's reverting back to the CPU when you enable no-half and the other options. So this won't work, otherwise you would train an embedding for a month. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon#poor-performance

andupotorac avatar Jun 07 '23 15:06 andupotorac

The issue is it's reverting back to the CPU when you enable no-half and the other options. So this won't work, otherwise you would train an embedding for a month. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon#poor-performance

So does this mean that there is no current solution for running a textual inversion on an M1 Mac, other than being willing to train an embedding for a month?!?

ananoman avatar Jun 21 '23 19:06 ananoman

There is. I was able to train with A1111 but got issues with it, so I'm now using InvokeAI's Terminal GUI with these settings. And it's been going fine. Still plan to try A1111 again at some point for training too.

For LORAs I use Kohya.

Screenshot 2023-06-21 at 20 46 00 Screenshot 2023-06-21 at 20 46 55

andupotorac avatar Jun 21 '23 20:06 andupotorac

With these settings for example I trained a cap embedding, then I put it on a goat. image image

andupotorac avatar Jun 21 '23 20:06 andupotorac