ml-stable-diffusion icon indicating copy to clipboard operation
ml-stable-diffusion copied to clipboard

Image generation not working with controlNet on iPadOS. Out of memory crash

Open SaladDays831 opened this issue 1 year ago • 11 comments

Hi! :)

I'm running into an out-of-memory crash on a physical iPad Pro M2 (2022) when trying to generate an image with a controlNet model.

Specs: iPad Pro 11-inch (4th generation) iPadOS 17.0 Xcode 15.0.0 ml-stable-diffusion 1.0.0

Models used: Stable Diffusion 1.5: https://huggingface.co/runwayml/stable-diffusion-v1-5 ControlNet Canny: https://huggingface.co/lllyasviel/sd-controlnet-canny

Parameters: imageCount = 1 stepCount = 15 scheduler = .dpmSolverMultistepScheduler disableSafety = true reduceMemory = true computeUnits = .cpuAndNeuralEngine

Converted the models using the provided python package with this command:

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker --model-version "runwayml/stable-diffusion-v1-5" --chunk-unet --unet-support-controlnet --convert-controlnet "lllyasviel/sd-controlnet-canny" --bundle-resources-for-swift-cli -o "/path/to/save/the/converted/model"

Also tried with a model generated with an added --quantize-nbits 6 argument, same result.

The StableDiffusionPipeline is successfully created, and the out-of-memory crash happens when the generator is already sending progress callbacks.

Without controlNet the generation process is smooth and really fast. The crash happens only on a physical device, on an iPad simulator it successfully generates an image (though takes ages).

The heaviest stuff is happening inside Unet.predictions(from:) method CleanShot 2023-06-21 at 11 24 15@2x

P.S. Thanks for such an exciting package! Hoping to get to play around with controlNet soon :)

SaladDays831 avatar Jun 21 '23 08:06 SaladDays831

Have you tried with a conversion that uses both --quantize-nbits 6 and --attention-implementation SPLIT_EINSUM_V2 ? Along with --chunk-unet, that should get you every optimization available for mobile devices.

jrittvo avatar Jun 21 '23 13:06 jrittvo

@jrittvo just tried converting the same models using the same command, but with --attention-implementation SPLIT_EINSUM_V2, no changes :(

I wonder if anyone had any luck using controlNet on iOS/iPadOS devices. I've tested it on macOS (MochiDiffusion), but with the original models (the ones you converted by the way 😁), but haven't seen any examples on iOS/iPadOS

SaladDays831 avatar Jun 21 '23 15:06 SaladDays831

You're the first I heard experimenting with ControlNet on iOS. Is the screen cap you posted reading your physical device or from the simulator? Either way, what are the comparable numbers without a ControlNet? I've wondered what kind of memory load a ControlNet adds.

As a side note, the --quantize-nbits also reduces the size of the converted ControlNet model, and on a Mac, you can use 6-bit ControlNet models with 16-bit base models, and I see no change in quality at all using the reduced size ControlNet model. I think for myself, I'm going to convert all my CN models to the smaller size for use in any situation.

jrittvo avatar Jun 21 '23 15:06 jrittvo

The list of target devices says M1 for iPad. (A14 on iPhone.) Maybe that is a hard and fast limit? Seems it would also fail on the simulator, though, if it is truly an accurate simulation set for your specs.

jrittvo avatar Jun 21 '23 15:06 jrittvo

Thanks for the report @SaladDays831! Are you able to share any logs right around time it crashes? Could you also try without --chunk-unet? After --quantize-nbits 6, you no longer need to chunk the Unet and it might be causing some issues.

atiorh avatar Jun 21 '23 18:06 atiorh

Hi @atiorh! Thanks for your response As you suggested, I tried running it with an un-chunked model and it did.. something 😄 It gets to about step 5/15, the process gets detached from Xcode but the generation keeps running on the iPad. It finishes successfully, but the generated image always ends up being a mess (attached an example). I’ve gotten similar results when using a faulty model, or a model not compatible with the selected computeUnit (cpuAndNeuralEngine in my case)

Just to clarify, on the iPad simulator using the model generated by the command in the issue description I get good results from controlNet.

Which logs exactly could be helpful? No errors get printed to the Xcode console and the Console.app connected to the iPad doesn't seem to have anything of interest at the time of the crash

SaladDays831 avatar Jun 21 '23 22:06 SaladDays831

This may be a red herring, but with the Swift CLI, I think I remember seeing the step counter progress line in Terminal showing that with a CN, the step count it uses was 1/2 of the count I set in my prompt command. So if you prompted 15 steps, you may only be getting 7. Your image looks like more than 7 steps, but maybe try a prompt with 30 steps?

jrittvo avatar Jun 21 '23 23:06 jrittvo

@jrittvo yeah I also noticed this, but for me it actually happens only with img2img without CN. CN and text2img don't halve the step count. I tried increasing the step count anyways and noticed that it still crashes on step ~18-21

Also just tested another model without --chunk-unet and without --attention-implementation SPLIT_EINSUM_V2, getting the same behavior as in my previous comment

SaladDays831 avatar Jun 21 '23 23:06 SaladDays831

Deleted an earlier post. Issue was with the app I built. It was still using ml-stable-diffusion 0.4.0. Will try building it again tomorrow with 1.0.0 in Xcode 15 beta and testing in macOS 14 beta.

jrittvo avatar Jun 22 '23 05:06 jrittvo

@jrittvo thanks for your input! I'll test different model combinations/parameters and post the results here once I have smth

Anyone that got controlNet running on iOS/iPadOS - please let me know! 😁

SaladDays831 avatar Jun 22 '23 09:06 SaladDays831

I built Mochi Diffusion with ml-stable-diffusion 1.0.0 and Xcode 15 beta 2 and ran it in macOS 14 beta 2. So far I've only tested without ControlNets. The split-einsum-v1 models, using NE, run when they feel like it. The split-einsum-v2 model, using NE, will not even load successfully. All model types work fine with CPU and GPU. This new build of Mochi has a memory de-allocation bug (it tries to delete the old model twice) when changing models or compute unit that we need to fix. Maybe this also has something to do with the s-e-v2 model not running on NE. I will get a conda environment set up to run the Swift CLI Stable Diffusion pipeline in the next day or two to test without the Mochi variable in the mix.

jrittvo avatar Jun 22 '23 11:06 jrittvo

Have you gotten anywhere with your iPad and the ControlNets? I have a Swift CLI conda environment with ml-stable-diffusion-1.0.0 running in macOS 14 beta 2 with Xcode 15 beta 2 now. The issue I had with the split-einsum-v2 models using cpuAndNeuralEngine with the Mochi app are happening in the Swift CLI. So the bug is not coming from the app. The same bug is present using the split-einsum-v2 models with ml-stable-diffusion-0.4.0 running in macOS 13.5 with Xcode 14.3, but I don't know that the split-einsum-v2 models were ever intended to work in that environment, so I'm going to keep this just about macOS 14, Xcode 15, and ml-stable-diffusion-1.0.0.

Sometime over the weekend I will open an issue on my bug. I believe it is reproducible, and I have a good lead on what it is about that I think will help someone smarter than me to start tracing it.

jrittvo avatar Jun 24 '23 06:06 jrittvo

Hi @jrittvo! Sorry, I was offline for a couple of days

Not really, I switched focus on some other stuff hoping to get some input from the repo maintainers, as it looks like that's an internal issue unrelated to the model type/conversion parameters.

At some point I'll start testing this again but I guess I've already juggled with all the model parameters :(

@atiorh if I can help you debug this issue, please let me know which logs I can provide to give you more context. The Xcode console is not showing any errors.

SaladDays831 avatar Jun 28 '23 11:06 SaladDays831

I got it running using a model with --quantize-nbits 6 --attention-implementation SPLIT_EINSUM_V2, and no --chunk-unet, but only with prompt to image. startingImage should be nil, otherwise the same out of memory crash occurs. It still uses a lot of memory and sometimes disconnects from the debugger during generation, but at least it works.

Image to image with controlNet still doesn't work on a physical device

SaladDays831 avatar Jul 06 '23 06:07 SaladDays831

Hey @SaladDays831, thanks for your patience! I am just able to get back on this thread. Glad to hear you have ControlNet running (albeit without the startingImage) on the iPad! Providing startingImage loads the VAEEncoder.mlmodelc (~70MB of weights + some more for activations) but that should almost not matter when reduceMemory is enabled as the encoder would be offloaded after the image is encoded once before the diffusion loop starts. Could you please confirm that reduceMemory is still enabled in your tests?

atiorh avatar Jul 08 '23 23:07 atiorh

Hi @atiorh! I realized that I was missing the VAEEncoder.mlmodelc, adding the --convert-vae-encoder to my command converted it successfully, and I'm able to run controlNet with startingImage now.

If I'm not missing anything, the --convert-vae-encoder parameter is not listed anywhere in the "Converting Models to Core ML" section of the readme, so it was a lucky guess 😁

Closing the issue as it's resolved. Also here's the final conversion command used, maybe it'll help somebody like me out :) :

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-safety-checker --model-version "runwayml/stable-diffusion-v1-5" --unet-support-controlnet --quantize-nbits 6 --attention-implementation SPLIT_EINSUM_V2 --convert-controlnet "lllyasviel/sd-controlnet-canny" --bundle-resources-for-swift-cli -o "/path/to/save/the/model"

SaladDays831 avatar Jul 11 '23 12:07 SaladDays831

Thanks @SaladDays831 , great to hear this is working now! Feedback received, adding --convert-vae-encoder to be more visible in the README

atiorh avatar Jul 11 '23 20:07 atiorh

Hi @SaladDays831 It sounds like you're using the reduceMemory = true setting and it works fine. Could you let me know where you've placed your model file? Because I currently have my downloaded model file in the "Download" directory. When I set reduceMemory = true, it will find the textEncoder file deeper as TextEncoder.mlmodelc/coremldata.bin so that it said it can't find the model file and crashed. However, if I set it to false, it's able to locate the file and load the resource properly.

TimYao18 avatar Aug 17 '23 06:08 TimYao18

Hey @TimYao18 I'm running this on iPadOS, so my model files are inside the project directory, and I access them like so:

guard let path = Bundle.main.path(forResource: "SD_1.5_canny_openpose", ofType: nil, inDirectory: "SD_Models") else {
    fatalError("Fatal error: failed to find the CoreML models.")
}
let resourceURL = URL(fileURLWithPath: path)

Where SD_Models is a folder in the project root, and SD_1.5_canny_openpose is a folder inside it (with the models)

You can check out the MochiDiffusion repo to see how they do it on macOS

SaladDays831 avatar Aug 17 '23 09:08 SaladDays831