ml-stable-diffusion icon indicating copy to clipboard operation
ml-stable-diffusion copied to clipboard

There appear to be 1 leaked semaphore objects to clean up at shutdown

Open oscarnevarezleal opened this issue 3 years ago • 61 comments

Can't complete the conversion Models to Core ML

Chip: Apple M2
Memory: 8GB
OS: 13.0.1 (22A400)
pip list
Package                        Version    Editable project location
------------------------------ ---------- ----------------------------------------------------------
accelerate                     0.15.0
certifi                        2022.9.24
charset-normalizer             2.1.1
coremltools                    6.1
diffusers                      0.9.0
filelock                       3.8.0
huggingface-hub                0.11.1
idna                           3.4
importlib-metadata             5.1.0
mpmath                         1.2.1
numpy                          1.23.5
packaging                      21.3
Pillow                         9.3.0
pip                            21.3.1
protobuf                       3.20.3
psutil                         5.9.4
pyparsing                      3.0.9
python-coreml-stable-diffusion 0.1.0      /Users/....
PyYAML                         6.0
regex                          2022.10.31
requests                       2.28.1
scipy                          1.9.3
setuptools                     60.2.0
sympy                          1.11.1
tokenizers                     0.13.2
torch                          1.12.0
tqdm                           4.64.1
transformers                   4.25.1
typing_extensions              4.4.0
urllib3                        1.26.13
wheel                          0.37.1
zipp                           3.11.0

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o packages

!!! macOS 13.1 and newer or iOS/iPadOS 16.2 and newer is required for best performance !!!
INFO:__main__:Initializing StableDiffusionPipeline with CompVis/stable-diffusion-v1-4..
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 11636.70it/s]
INFO:__main__:Done.
INFO:__main__:Converting vae_decoder
INFO:__main__:`vae_decoder` already exists at packages/Stable_Diffusion_version_CompVis_stable-diffusion-v1-4_vae_decoder.mlpackage, skipping conversion.
INFO:__main__:Converted vae_decoder
INFO:__main__:Converting unet
INFO:__main__:Attention implementation in effect: AttentionImplementations.SPLIT_EINSUM
INFO:__main__:Sample inputs spec: {'sample': (torch.Size([2, 4, 64, 64]), torch.float32), 'timestep': (torch.Size([2]), torch.float32), 'encoder_hidden_states': (torch.Size([2, 768, 1, 77]), torch.float32)}
INFO:__main__:JIT tracing..
/Users/xxx/xxx/apple/ml-stable-diffusion/venv/lib/python3.9/site-packages/torch/nn/functional.py:2515: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  _verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:]))
/Users/xxx/xxx/apple/ml-stable-diffusion/python_coreml_stable_diffusion/layer_norm.py:61: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert inputs.size(1) == self.num_channels
INFO:__main__:Done.
INFO:__main__:Converting unet to CoreML..
WARNING:coremltools:Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                           | 0/7876 [00:00<?, ? ops/s]WARNING:coremltools:Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|█████████████████████████████████████████████████████████████▉| 7874/7876 [00:01<00:00, 4105.24 ops/s]
Running MIL Common passes: 100%|███████████████████████████████████████████████████████████████████████████████| 39/39 [00:27<00:00,  1.43 passes/s]
Running MIL FP16ComputePrecision pass: 100%|█████████████████████████████████████████████████████████████████████| 1/1 [00:44<00:00, 44.50s/ passes]
Running MIL Clean up passes: 100%|█████████████████████████████████████████████████████████████████████████████| 11/11 [03:00<00:00, 16.40s/ passes]
zsh: killed     python -m python_coreml_stable_diffusion.torch2coreml --convert-unet    -o
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

oscarnevarezleal avatar Dec 02 '22 03:12 oscarnevarezleal

I had the same issue: https://github.com/apple/ml-stable-diffusion/issues/5

enzyme69 avatar Dec 02 '22 10:12 enzyme69

Same thing here for me and in the end I'm missing the safety_checker CoreML model

felipebaez avatar Dec 02 '22 11:12 felipebaez

Just updated the OS to 13.1 preview, still facing the same error.

oscarnevarezleal avatar Dec 02 '22 15:12 oscarnevarezleal

Same here.

Apple M1 Pro 16 GB RAM macOS 13.0.1 (22A400)

Edit: After some investigation it seems like my Mac ran out of memory. It worked well in a later attempt.

Screenshot 2022-12-03 at 14 42 41

martinlexow avatar Dec 03 '22 13:12 martinlexow

8 GB will cause run out of memory issue. As suggested by Yasuhito. Best if you can ask a compiled model from someone... or try running again and again with Terminal only when logging in

enzyme69 avatar Dec 05 '22 22:12 enzyme69

Same here.

Apple M1 Pro 16 GB RAM macOS 13.0.1 (22A400)

Edit: After some investigation it seems like my Mac ran out of memory. It worked well in a later attempt.

Screenshot 2022-12-03 at 14 42 41

I have the same RAM memory on my Mac. Did you keep trying until it worked eventually?

mariapatulea avatar Apr 06 '23 11:04 mariapatulea

@mariapatulea never worked 4 me

oscarnevarezleal avatar Apr 07 '23 04:04 oscarnevarezleal

I think this is an issue with tqdm and floating point refs on the progress bar.

I get the same issue and don't have coreml installed.

tqdm    4.65.0

bensh avatar Apr 26 '23 07:04 bensh

Hi there!

Has somebody found any solution to this problem? I'm facing the same issue on M1 chip.

Siriz23 avatar May 23 '23 18:05 Siriz23

I'm facing the same issue on M1 chip. Anyone has solution?

tahuuha avatar May 29 '23 12:05 tahuuha

Check the solution: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/1890

tahuuha avatar May 29 '23 12:05 tahuuha

I've got the same problem in stable diffusion V 1.5.1 running on Macbook M2: anaconda3/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

AlanZhou2022 avatar Aug 17 '23 14:08 AlanZhou2022

The line you quoted is just a warning, and does not cause any issues. The most common reason why conversions fail is running out of memory, just like in OP's case, look for a line that says or contains "Killed".

vzsg avatar Aug 17 '23 15:08 vzsg

i am using macbook pro ventura m2 chip and facing the same issue

gamesbykk avatar Sep 17 '23 13:09 gamesbykk

Problem solved on my side by downgrading Python to 3.10.13

frankl1 avatar Oct 07 '23 10:10 frankl1

I got this error with PyTorch mps while running tqdm=4.65.0. I was able to remove it and install 4.66.1 which solved it. Not a RAM issue.

zhanwenchen avatar Oct 14 '23 03:10 zhanwenchen

I think it might be RAM related even if package versons help - they may just use memory better. It consistently failed for me and then I closed everything on my Mac that I could and it ran fine without changing versions. 🤷

YakDriver avatar Oct 20 '23 20:10 YakDriver

I got this error with PyTorch mps while running tqdm=4.65.0. I was able to remove it and install 4.66.1 which solved it. Not a RAM issue.

I agree it's not a RAM issue, I have 96GB of RAM on a custom-built M2 model and I'm getting the error. I can guarantee it has nothing to do with RAM

chris-heney avatar Oct 22 '23 02:10 chris-heney

+1 with the error. M1 Max 64GB

42piratas avatar Nov 07 '23 13:11 42piratas

Getting the same error when training Dreambooth. Did anyone figure out a solution to this?

loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<1x1280xf16>' and 'tensor<1280xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
./webui.sh: line 255: 38149 Abort trap: 6           "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"
/opt/homebrew/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

mo-foodbit avatar Nov 15 '23 06:11 mo-foodbit

It's not the same error though. Yours was:

error: input types 'tensor<1x1280xf16>' and 'tensor<1280xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).

The warning about the semaphore, just like in the OP (where the real error was zsh: Killed, due to running out of memory), is just a red herring, that gets printed after both successful and failed conversions.

vzsg avatar Nov 15 '23 11:11 vzsg

I have the same error on a M3 model with 36GB memory! :(

mossishahi avatar Dec 19 '23 01:12 mossishahi

Same issue on M3 with 128GB ram

LukaVerhoeven avatar Dec 28 '23 10:12 LukaVerhoeven

@LukaVerhoeven nice config^ 🙂

julien-c avatar Jan 02 '24 08:01 julien-c

@LukaVerhoeven nice config^ 🙂

Was hoping on no memory issues with this setup 😒

LukaVerhoeven avatar Jan 10 '24 07:01 LukaVerhoeven

It seems related to device type (Mac mps type). When I move mps type tensor to cpu(), the problem no longer appears.

zzingae avatar Jan 17 '24 15:01 zzingae

same error on M3 Max 96GB while trying to run invokeAI, any solution?

lemonsz15 avatar Jan 20 '24 17:01 lemonsz15

I think this is an issue with tqdm and floating point refs on the progress bar.

I get the same issue and don't have coreml installed.

tqdm    4.65.0

Removing tqdm solved my issue. Thank you!

Blenderama avatar Jan 30 '24 14:01 Blenderama

In my opinion because you run it on the docker so that the shm size is so small,you can run df -lh to watch its size, therefore you need create the docker with --shm-size=2G then i successfully run it

yunshiyu11 avatar Apr 07 '24 13:04 yunshiyu11

Same here on Apple M3 Max 36GB MacBook Pro. Never installed CoreML. Upgrading from tqdm=4.65.0 to 4.66.1 solves the problem.

chenyangkang avatar Apr 17 '24 04:04 chenyangkang