ml-stable-diffusion icon indicating copy to clipboard operation
ml-stable-diffusion copied to clipboard

Image2image

Open littleowl opened this issue 2 years ago • 14 comments

Adds image2image functionality.

In Python, a new CoreML model can be generated to encode the latent space for image2image. The model bakes in some of the operations typically performed in the pipeline so that a separate model would not need to be created for those operations, now would the CPU be needed to perform the tensor multiplications. Some of the simpler math involving the scheduler's time steps are performed on the cpu and passed into the encoder. The encoder works around torch.randn missing operation by passing in nose tensors to apply to the image latent space.

In Swift, an Encoder class is created. Various changes to the scheduler, pipeline, and CLI to support input image and strength. CGImage creation from MLShapedArray is moved into it's own file along with the new function to create a MLShapedArray from a CGImage. Image loading and preparation is currently handled / optimized with vImage.

Understandable a desire to maybe use the Image Input type for CoreML / CoreMLTools, however, I chose not to optimize in this way at this at this point because of trouble that I have had getting enumerated input shapes to work with the models and current python script. Please see: #69 and #70.

The new DPMSolverMultistepScheduler does not work with image2image, and looking at the Diffusers library documentation, it does not look like it is supported there either, so, it is currently disabled and should throw an error. Though I also made it safe so it will not crash..

Thank you for providing this repo.

Do not erase the below when submitting your pull request: #########

  • [x] I agree to the terms outlined in CONTRIBUTING.md

littleowl avatar Dec 18 '22 17:12 littleowl

When using --image on swift CLI (macOS 13.1) I'm getting "Error: startingImageProvidedWithoutEncoder". Running the patched code without --image runs fine and produces similar output results as the original/non-patched code. Any suggestions where to start to troubleshoot?

jsdomingue avatar Dec 19 '22 07:12 jsdomingue

You would need to run the python script to generate the Encoder model. for instance: python -m python_coreml_stable_diffusion.torch2coreml --model-version ../stable-diffusion-2-base --convert-vae-encoder --bundle-resources-for-swift-cli --check-output-correctness --attention-implementation ORIGINAL -o ../Generated/CoreML/StableDiffusion2-base/ORIGINAL I haven't published any generated coreml models on hugging face, maybe I may do that at some point.

littleowl avatar Dec 19 '22 07:12 littleowl

in my swift test app I'm getting an "array out of range" error in file AlphasCumprodCalculation.swift line 25. It tries to subscribe item 1002 of an array containing 1000 items. Changing this line to let initTimestep = timesteps - timesteps / steps * (steps - tEnc) - 1 seems to fix the issue. Am I missing something?

TheMurusTeam avatar Dec 19 '22 16:12 TheMurusTeam

@TheMurusTeam Thanks for catching that! I have not tested with full strength 1.0 since altering that part of the code. strength of 1.0 kinda defeats the purpose, maybe it should not be the default. Though, your change will fail if strength is 0.0. I pushed a change that will clamp this result so it's safe.

littleowl avatar Dec 19 '22 21:12 littleowl

Hey @littleowl, thank you for the PR! This is a relatively extensive one so could you please rebase on main and separate this into two PRs: one for the Python component and the other one for the Swift component?

atiorh avatar Dec 27 '22 03:12 atiorh

I did a quick pass and noticed that you are passing noise tensors as inputs due to the randn op missing from coremltools torch frontend. I think it would help simplify the interface and the overall code if the noise tensor was produced inside the model. Would you be willing to try something like the following?

from coremltools.converters.mil.frontend.torch.torch_op_registry import register_torch_op
from coremltools.converters.mil.frontend.torch.ops import _get_inputs
from coremltools.converters.mil import Builder as mb

@register_torch_op
def randn(context, node):
    inputs = _get_inputs(context, node, expected=5)
    shape = inputs[0]
    
    x = mb.random_normal(shape=shape, mean=0., stddev=1.)
    context.add(x, node.name)

This (or a version of this) should enable automatic conversion of torch.randn op.

atiorh avatar Dec 27 '22 03:12 atiorh

@pcuenca Could you please chime in on whether DPMSolverMultistepScheduler should or should not be supported by image2image? 🙏

atiorh avatar Dec 27 '22 04:12 atiorh

@atiorh I will be happy to split up the PR into as many pieces as desired. The only question I have for the randn override function is providing the seed. I'll try to figure that out, but if that is top of mind to you, it would be beneficial. I will also amend the readme for instructions on generating the encoder model.

I do plan to follow up with a pipeline supporting in-painting after these code changes go in.

@pcuenca I do have the same question. I'm not sure whether the DPMSolverMultistepScheduler should/will support imageToImage or not. Initially, I did notice that this scheduler was not among the supported schedulers in the diffusion library for this, but then when looking again, I saw a reference where maybe it might be. Nonetheless, I did try to make it support - with some interesting image artifacts - but did not find a solution. If it should be supported, I could use your help or insight to make it so.

@atiorh - As an aside regarding different issue - along the lines of providing alternatives to PyTorch operations, for the goal of dynamic aspect ratio inputs, I did get past an initial error when trying to support flexible input shapes by doing a similar routine as you described to register an op with group_norm only be confronted with another operation that was missing and I could not find a work around for it. (I forget exactly what it was, but seems like it was member-wsie addition or similar ). [#70] Also, I'm not sure if there are more complications after that one, and whether the MLProgram (vs neural network) will work with the dynamic shapes with this model or not. What is the best place to seek help in regards to this? coremltools repo or the developer forums?

littleowl avatar Dec 27 '22 12:12 littleowl

@atiorh @littleowl I think DPMSolverMultistepScheduler should also work. I just double-checked in diffusers and it did.

I'll try to debug what might be going on.

pcuenca avatar Dec 28 '22 11:12 pcuenca

The problem is that timesteps reversal had been moved to the new function calculateTimesteps, but only for the PNDM scheduler. See this PR for a proposed fix.

With this change, image2image generation works using DPMSolverMultistepScheduler.

(Side note: while debugging this I noticed minor differences in the timesteps for DPMSolverMultistepScheduler with respect to the reference Python implementation. I'll open a separate PR about that).

pcuenca avatar Dec 28 '22 19:12 pcuenca

@pcuenca Amazing, thank you!

@littleowl I recommend that we merge @pcuenca's PR into your fork and verify that the DPMSolverMultistepScheduler is working as expected including the minor timestep differences Pedro mentioned above. In the meantime, feel free to submit the Python-only PR so we can start iterating on that. Regarding your question on setting the seed for the Core ML based randn op, I will look into that and get back to you. Regarding your other question about dynamic aspect ratios and flexible shapes, please create an issue on the coremltools repo with a minimal repro (and a reference to the on this repo) and cc me so I can also help move things along. 🙏

atiorh avatar Dec 29 '22 05:12 atiorh

any updates on the swift image to image? 👀

such awesome work btw

EthanSK avatar Jan 02 '23 17:01 EthanSK

Sorry for any delay. I’ve been spending time with family and am just getting back from vacation. I should have the new PRs ready to go in the next day or so.

littleowl avatar Jan 02 '23 17:01 littleowl

woohoo. got it working!

I am an iOS dev for a long time, but very much an ML noob. Couple things I stumbled through:

  1. I used the StableDiffusionSample and there was an error with the type of the seed variable (UInt32 vs Int), I just forced it to an Int and it worked fine
  2. I also ran into the encoder model not being compiled. This stumped me for a bit, cause I had ran the python command to compile the resources, but didn't see the specifics of @littleowl 's note above. The important bit is the --convert-vae-encoder (not just the text encoder). Once I added that, it worked fine.
  3. Images have to be 512x512 or it errors out
  4. Pretty quickly ran into 'failed to generate prediction for sample 0' on my MacMini M1. Added the --reduce-memory option and it seems happy again

But I have it working now! This is amazing...can't wait to see what 2023 brings if Apple is open sourcing this now!

pj4533 avatar Jan 08 '23 02:01 pj4533

It crashes on my real device (iphone xs max) because the encoder runs out of memory at around 2gb, even though I gave extended adressing and increased memory capabilities. Has anyone got it to run on an actual iOS device as opposed to the simulator?

EthanSK avatar Jan 10 '23 15:01 EthanSK

@EthanSK I don’t think you will get the framework to work on your iPhone XS since minimum system requirements say you’ll need at least a iPhone 12 Pro with 6GB+ RAM.

martinlexow avatar Jan 10 '23 15:01 martinlexow

Just wanted to pop back in here and say how much fun this PR has enabled for me. I stumble through python, but feel very comfortable with swift, so this is great! First thing I did was modify the sample so that you can use the output as the input... 🤯

Hope it gets merged soon.

pj4533 avatar Jan 10 '23 22:01 pj4533

Ah. so the issue is using v2.1 with neural engine enabled (on actual device)

would be able to get a compiled split_ensum with the VAE encoder in it for version 2 or earlier pls? @pcuenca I'm having difficulty generating them locally ("Error computing NN outputs",)

also, i'm confused how to go from .mlpackage to the separated .mlmodelc files?

EthanSK avatar Jan 11 '23 23:01 EthanSK

@EthanSK I don’t think you will get the framework to work on your iPhone XS since minimum system requirements say you’ll need at least a iPhone 12 Pro with 6GB+ RAM.

Speaking of which; does anyone know of required device settings that will allow it to only run on supported devices?

For example, iphone-ipad-minimum-performance-a12 is the closest I can find, but it will allow it to run on A12 devices which includes the XS model.

3DTOPO avatar Jan 13 '23 23:01 3DTOPO

@EthanSK I don’t think you will get the framework to work on your iPhone XS since minimum system requirements say you’ll need at least a iPhone 12 Pro with 6GB+ RAM.

Speaking of which; does anyone know of required device settings that will allow it to only run on supported devices?

For example, iphone-ipad-minimum-performance-a12 is the closest I can find, but it will allow it to run on A12 devices which includes the XS model.

I have the same problem, By setting up required device capabilities I found that Only iphone-ipad-minimum-performance-a12 option comes closest to. Apple reviewers may take some non-functioning devices for review, and my app was rejected for this reason when I released a new version

jiangdi0924 avatar Jan 16 '23 01:01 jiangdi0924

I have the same problem, By setting up required device capabilities I found that Only iphone-ipad-minimum-performance-a12 option comes closest to. Apple reviewers may take some non-functioning devices for review, and my app was rejected for this reason when I released a new version

The issue is much larger than getting rejected. It means anyone can purchase and install it on unsupported devices which will almost certainly get you 1 star reviews and amounts to a form of theft.

Personally I've lost all interest in Stable Diffusion for now. Now that they are being sued, seems like any products based on it are a liability and I can't afford the risk, not to mention on moral concerns.

It makes earning a living on the App Store even harder because I have to compete against apps that I can't compete against...

In any event, I do wish we could restrict devices to M1 or better.

3DTOPO avatar Jan 16 '23 01:01 3DTOPO

I have the same problem, By setting up required device capabilities I found that Only iphone-ipad-minimum-performance-a12 option comes closest to. Apple reviewers may take some non-functioning devices for review, and my app was rejected for this reason when I released a new version

The issue is much larger than getting rejected. It means anyone can purchase and install it on unsupported devices which will almost certainly get you 1 star reviews and amounts to a form of theft.

Personally I've lost all interest in Stable Diffusion for now. Now that they are being sued, seems like any products based on it are a liability and I can't afford the risk, not to mention on moral concerns.

It makes earning a living on the App Store even harder because I have to compete against apps that I can't compete against...

In any event, I do wish we could restrict devices with Apple Silicon.

Definitely, bad reviews are not conducive to the development of the product.

jiangdi0924 avatar Jan 16 '23 02:01 jiangdi0924

@littleowl Just checking in after the holidays, please let us know if you are blocked on anything 🙏

atiorh avatar Jan 24 '23 17:01 atiorh

would be able to get a compiled split_ensum with the VAE encoder in it for version 2 or earlier pls? @pcuenca I'm having difficulty generating them locally ("Error computing NN outputs",)

I'll take a look :)

pcuenca avatar Jan 26 '23 14:01 pcuenca

Sorry @atiorh and everyone for the delay. Thanks for keeping the conversation though. I will close this PR and have opened two more: Python Swift

Busy lately. Ultimately, while trying to override the randn function I had realized that I had tried that already when first encountering the issue. I detail the errors in the description of the other PR. I tried to explain in the PR why I don't believe it to be the right thing to do anyways. As, I think it is better to generate the noise in the CPU / Swift Space for flexibility of techniques.

The good news, that it is easy to get in-painting to work after these changes.

littleowl avatar Jan 28 '23 03:01 littleowl