StableDiffusion
StableDiffusion copied to clipboard
Img2Img
Hello and thanks for this great project!
I've attempted to add an init image to the pipeline, but it ends up just becoming a blur no matter how many steps it runs, as if a gaussian blur was applied to the image. Maybe different schedulers would affect it?
public static Tensor<float> GetLatentSampleFromImage(Image<RgbaVector> image, int batchSize, int width, int height)
{
//noise here?
image.Mutate(ctx =>
{
ctx.Resize(width - width % 64, height - height % 64);
});
var channels = 3;
var latents = new DenseTensor<float>(new[] { batchSize, channels, height, width });
for (int y = 0; y < image.Height; y++)
{
image.ProcessPixelRows(ctx =>
{
Span<RgbaVector> row = ctx.GetRowSpan(y);
for (int x = 0; x < image.Width; x++)//maybe .transpose(0, 3, 1, 2)?
{
latents[0, 0, y, x] = (float)(row[x].R * 2f - 1f);
latents[0, 1, y, x] = (float)(row[x].G * 2f - 1f);
latents[0, 2, y, x] = (float)(row[x].B * 2f - 1f);
}
});
}
return latents;
}
Then encode it through the vae_encoder. Does that look about right?
The code does seem to be the general theme of it. Using the new EulerA scheduler, it appears to follow the image very slightly. There's no strength available, and I've been trying to add it in. Without the strength parameter, it follows the input image at maybe 0.99. My guess is that it's related to the sigmas, but haven't narrowed it down yet.
Hey, I am also trying to implement img2img!
For what I've understood across python implementations, shouldn't be a complicated workflow (not worst than inpainting 😅).
What I am doing: I am not using GenerateLatentSample() for generating the noise, instead, calling a function like you did which first resize the image, then applying a gaussian noise filter for the random result (maybe it's not the same kind of noise as expected from the guidelines), then create the tensor.
Anyway it's a nice project, easiest than python implementation and much more fun since it's pure C# :)
Please refer to my C++ solution, which also uses ONNX runtime: https://github.com/cassiebreviu/StableDiffusion Here is a sample app: https://github.com/axodox/unpaint
My example also does inpainting.
Did anyone manage to do this? Is this repo dead? There seem to be no answers to the issues.
Hello and thanks for this great project!
I've attempted to add an init image to the pipeline, but it ends up just becoming a blur no matter how many steps it runs, as if a gaussian blur was applied to the image. Maybe different schedulers would affect it?
public static Tensor<float> GetLatentSampleFromImage(Image<RgbaVector> image, int batchSize, int width, int height) { //noise here? image.Mutate(ctx => { ctx.Resize(width - width % 64, height - height % 64); }); var channels = 3; var latents = new DenseTensor<float>(new[] { batchSize, channels, height, width }); for (int y = 0; y < image.Height; y++) { image.ProcessPixelRows(ctx => { Span<RgbaVector> row = ctx.GetRowSpan(y); for (int x = 0; x < image.Width; x++)//maybe .transpose(0, 3, 1, 2)? { latents[0, 0, y, x] = (float)(row[x].R * 2f - 1f); latents[0, 1, y, x] = (float)(row[x].G * 2f - 1f); latents[0, 2, y, x] = (float)(row[x].B * 2f - 1f); } }); } return latents; }Then encode it through the
vae_encoder. Does that look about right?
Whoops sorry, I wont add any links this time
Thats almost everything you need to do, the main thing you need to do is scale the input image by the models scale factor
- Resize Image, like you have
- Multiply by scale factor e.g
InputTensor * 0.18215ffor StableDiffusion 1.5 - Implement
AddNoisemethod in the Schedulers(NoiseTensor * TimestepSigma) + SampleTensor - Add the noise to the InputTensor
- You will need to scale back the timesteps, they call this strength in the python app, something like
var inittimestep = Math.Min((int)(options.InferenceSteps * options.Strength), options.InferenceSteps);
var start = Math.Max(options.InferenceSteps - inittimestep, 0);
var newTimesteps = scheduler.Timesteps.Skip(start).ToList();`
Then you can use these timesteps in the Inference method of the Unet class
for (int t = 0; t < newTimesteps; t++)
This skips n steps starting it in the middle if diffusion process and the input latent as the representation of the firts skipped steps (if that makes sense), Any additonal steps are finishing the image using the prompts and guidence provided
If you find that the changes to the input image are to subtle, try adding a random tensor to the InputImage before scaling, this can introduce more noise to the image and give the inference more stuff to work with to create more details, just a random range from -1 to +1 should be fine
And that should be it, Img2Img should work
I have completed it to a certain degree of perfection.
public static TensorGetLatentSampleFromImage(Bitmap image, int width, int height, int seed, float initNoiseSigma, float imageStrength) { var random = new Random(seed); double noiseAmplitude = 1; imageStrength = imageStrength * 10f; //ImageResize image.Mutate(ctx => { ctx.Resize(width - width / 8, height / 8); }); var batchSize = 1; var channels = 4; var latents1 = new DenseTensor (new[] { batchSize, channels, height / 8, width / 8 }); var latents2 = new DenseTensor (new[] { batchSize, channels, height / 8, width / 8 }); var latents2Array = latents2.ToArray(); for (int y = 0; y { Span row = ctx.GetRowSpan(y); for (int x = 0; x The noise used in text2image is added to "latents1" created from the code you presented. Specify 'imageStrength' within the range of 0.0 to 1.0, and then multiply it by 10 before applying it to 'latents1'. We restrain the intensity of the noise added together to prevent saturation, based on the value of 'imageStrength', thus reducing the strength of noise in 'latents2'.
By replacing the existing 'GenerateLatentSample()' function within the 'Inference' function, it operated successfully.
public static Bitmap Inference(String prompt, string negativePrompt, int seed, StableDiffusionConfig config, Bitmap baseImage, float imageStrength) { // Preprocess text var textEmbeddings = TextProcessing.PreprocessText(prompt, negativePrompt, config); var scheduler = new LMSDiscreteScheduler(); //var scheduler = new EulerAncestralDiscreteScheduler(); var timesteps = scheduler.SetTimesteps(config.NumInferenceSteps); // If you use the same seed, you will get the same image result. if (seed == 0) { seed = new Random().Next(); } //var seed = 329922609; Console.WriteLine($"Seed generated: {seed}"); // create latent tensor //var latents = GenerateLatentSample(config, seed, scheduler.InitNoiseSigma); //Commented out var latents = GetLatentSampleFromImage(config, baseImage, seed, scheduler.InitNoiseSigma, imageStrength); var sessionOptions = config.GetSessionOptionsForEp(); // Create Inference Session var unetSession = new InferenceSession(config.UnetOnnxPath, sessionOptions); var input = new List(); for (int t = 0; t If you increase the imageStrength, the quality will be impaired, so be careful not to increase it too much.