StableDiffusion Img2Img

Hello and thanks for this great project!

I've attempted to add an init image to the pipeline, but it ends up just becoming a blur no matter how many steps it runs, as if a gaussian blur was applied to the image. Maybe different schedulers would affect it?

        public static Tensor<float> GetLatentSampleFromImage(Image<RgbaVector> image, int batchSize, int width, int height)
        {
            //noise here?
            image.Mutate(ctx =>
            {
                ctx.Resize(width - width % 64, height - height % 64);
            });
            var channels = 3;
            var latents = new DenseTensor<float>(new[] { batchSize, channels, height, width });

            for (int y = 0; y < image.Height; y++)
            {
                image.ProcessPixelRows(ctx =>
                {
                    Span<RgbaVector> row = ctx.GetRowSpan(y);

                    for (int x = 0; x < image.Width; x++)//maybe .transpose(0, 3, 1, 2)?
                    {
                        latents[0, 0, y, x] = (float)(row[x].R * 2f - 1f);
                        latents[0, 1, y, x] = (float)(row[x].G * 2f - 1f);
                        latents[0, 2, y, x] = (float)(row[x].B * 2f - 1f);
                    }
                });
            }
            return latents;
        }

Then encode it through the vae_encoder. Does that look about right?

Mar 31 '23 00:03 jdluzen

The code does seem to be the general theme of it. Using the new EulerA scheduler, it appears to follow the image very slightly. There's no strength available, and I've been trying to add it in. Without the strength parameter, it follows the input image at maybe 0.99. My guess is that it's related to the sigmas, but haven't narrowed it down yet.

Apr 05 '23 22:04 jdluzen

Hey, I am also trying to implement img2img!

For what I've understood across python implementations, shouldn't be a complicated workflow (not worst than inpainting 😅).

What I am doing: I am not using GenerateLatentSample() for generating the noise, instead, calling a function like you did which first resize the image, then applying a gaussian noise filter for the random result (maybe it's not the same kind of noise as expected from the guidelines), then create the tensor.

Anyway it's a nice project, easiest than python implementation and much more fun since it's pure C# :)

Jun 02 '23 10:06 TryCatchStackOverflow

Please refer to my C++ solution, which also uses ONNX runtime: https://github.com/cassiebreviu/StableDiffusion Here is a sample app: https://github.com/axodox/unpaint

My example also does inpainting.

Jun 10 '23 10:06 axodox

Did anyone manage to do this? Is this repo dead? There seem to be no answers to the issues.

Sep 05 '23 23:09 bamdad-b

Hello and thanks for this great project!

I've attempted to add an init image to the pipeline, but it ends up just becoming a blur no matter how many steps it runs, as if a gaussian blur was applied to the image. Maybe different schedulers would affect it?

        public static Tensor<float> GetLatentSampleFromImage(Image<RgbaVector> image, int batchSize, int width, int height)
        {
            //noise here?
            image.Mutate(ctx =>
            {
                ctx.Resize(width - width % 64, height - height % 64);
            });
            var channels = 3;
            var latents = new DenseTensor<float>(new[] { batchSize, channels, height, width });

            for (int y = 0; y < image.Height; y++)
            {
                image.ProcessPixelRows(ctx =>
                {
                    Span<RgbaVector> row = ctx.GetRowSpan(y);

                    for (int x = 0; x < image.Width; x++)//maybe .transpose(0, 3, 1, 2)?
                    {
                        latents[0, 0, y, x] = (float)(row[x].R * 2f - 1f);
                        latents[0, 1, y, x] = (float)(row[x].G * 2f - 1f);
                        latents[0, 2, y, x] = (float)(row[x].B * 2f - 1f);
                    }
                });
            }
            return latents;
        }

Then encode it through the vae_encoder. Does that look about right?

Whoops sorry, I wont add any links this time

Thats almost everything you need to do, the main thing you need to do is scale the input image by the models scale factor

Resize Image, like you have
Multiply by scale factor e.g InputTensor * 0.18215f for StableDiffusion 1.5
Implement AddNoise method in the Schedulers (NoiseTensor * TimestepSigma) + SampleTensor
Add the noise to the InputTensor
You will need to scale back the timesteps, they call this strength in the python app, something like

var inittimestep = Math.Min((int)(options.InferenceSteps * options.Strength), options.InferenceSteps);
var start = Math.Max(options.InferenceSteps - inittimestep, 0);
var newTimesteps = scheduler.Timesteps.Skip(start).ToList();`

Then you can use these timesteps in the Inference method of the Unet class for (int t = 0; t < newTimesteps; t++)

This skips n steps starting it in the middle if diffusion process and the input latent as the representation of the firts skipped steps (if that makes sense), Any additonal steps are finishing the image using the prompts and guidence provided

If you find that the changes to the input image are to subtle, try adding a random tensor to the InputImage before scaling, this can introduce more noise to the image and give the inference more stuff to work with to create more details, just a random range from -1 to +1 should be fine

And that should be it, Img2Img should work

Oct 05 '23 02:10 saddam213

I have completed it to a certain degree of perfection.


public static Tensor GetLatentSampleFromImage(Bitmap image, int width, int height, int seed, float initNoiseSigma, float imageStrength)
{
    var random = new Random(seed);
    double noiseAmplitude = 1;
    imageStrength = imageStrength * 10f;
    //ImageResize
    image.Mutate(ctx =>
    {
        ctx.Resize(width - width / 8, height / 8);
    });
    var batchSize = 1;
    var channels = 4;
    var latents1 = new DenseTensor(new[] { batchSize, channels, height / 8, width / 8 });
    var latents2 = new DenseTensor(new[] { batchSize, channels, height / 8, width / 8 });
    var latents2Array = latents2.ToArray();
    for (int y = 0; y 
        {
            Span row = ctx.GetRowSpan(y);

            for (int x = 0; x 
The noise used in text2image is added to "latents1" created from the code you presented.
Specify 'imageStrength' within the range of 0.0 to 1.0, and then multiply it by 10 before applying it to 'latents1'.
We restrain the intensity of the noise added together to prevent saturation, based on the value of 'imageStrength', thus reducing the strength of noise in 'latents2'.
By replacing the existing 'GenerateLatentSample()' function within the 'Inference' function, it operated successfully.

public static Bitmap Inference(String prompt, string negativePrompt, int seed, StableDiffusionConfig config, Bitmap baseImage, float imageStrength)
{
    // Preprocess text
    var textEmbeddings = TextProcessing.PreprocessText(prompt, negativePrompt, config);

    var scheduler = new LMSDiscreteScheduler();
    //var scheduler = new EulerAncestralDiscreteScheduler();
    var timesteps = scheduler.SetTimesteps(config.NumInferenceSteps);
    //  If you use the same seed, you will get the same image result.
    if (seed == 0)
    {
        seed = new Random().Next();
    }
    //var seed = 329922609;
    Console.WriteLine($"Seed generated: {seed}");
    // create latent tensor
    //var latents = GenerateLatentSample(config, seed, scheduler.InitNoiseSigma); //Commented out
    var latents = GetLatentSampleFromImage(config, baseImage, seed, scheduler.InitNoiseSigma, imageStrength);
    var sessionOptions = config.GetSessionOptionsForEp();
    // Create Inference Session
    var unetSession = new InferenceSession(config.UnetOnnxPath, sessionOptions);
    var input = new List();
    for (int t = 0; t 
If you increase the imageStrength, the quality will be impaired, so be careful not to increase it too much.

Oct 24 '23 08:10 DevU2T

StableDiffusion StableDiffusion copied to clipboard

Img2Img

StableDiffusion
StableDiffusion copied to clipboard