diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

New Scheduler: add Euler Ancestral Scheduler to StableDiffusionPipeline

Open AbdullahAlfaraj opened this issue 1 year ago • 5 comments

Description:

I've added the Euler Ancestral Scheduler to the Stable Diffusion Pipeline. The code has been adapted from: https://github.com/crowsonkb/k-diffusion

the scheduler currently works; however, I need to clean it up and document it which I will do soon.

Checklist:

  • [X] functional code
  • [ ] clean up code
  • [ ] document the code

AbdullahAlfaraj avatar Sep 24 '22 19:09 AbdullahAlfaraj

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

awesome!!!!

dblunk88 avatar Sep 25 '22 21:09 dblunk88

This is awesome! I tried running it and got some numpy errors like this

Traceback (most recent call last):
  File "generate_candidates.py", line 13, in <module>
    pipe.scheduler = EulerAScheduler()
  File "/diffusers/src/diffusers/configuration_utils.py", line 401, in inner_init
    init(self, *args, **init_kwargs)
  File "/diffusers/src/diffusers/schedulers/scheduling_euler_a.py", line 189, in __init__
    self.sigmas = get_sigmas(self.DSsigmas,self.num_inference_steps)
  File "/diffusers/src/diffusers/schedulers/scheduling_euler_a.py", line 41, in get_sigmas
    return append_zero(np.flip(sigmas, 0))
  File "/diffusers/src/diffusers/schedulers/scheduling_euler_a.py", line 31, in append_zero
    return torch.cat([x, x.new_zeros([1])])
AttributeError: 'numpy.ndarray' object has no attribute 'new_zeros'
root@2a12b16379a2:/src# pip freeze | grep numpy
numpy==1.23.3

Do you know with which version of numpy this is working?

nielsrolf avatar Oct 10 '22 10:10 nielsrolf

Hi @nielsrolf, Thank you for the feedback. I have the same version of numpy as you.

python -c “import numpy; print(numpy.__version__)”
1.23.3

I fixed a bug in my code related to mixing numpy array and tensor in the same operation, the issue arises only when the device is set to cuda. see if that fix it for you. here is a colab it should provide you with an example of how to use it. https://colab.research.google.com/drive/1-c2WEXV46Hc-3ff_aPsZclyWZkdVKXWM#scrollTo=uj5o-ivImSnM

I will refactor the code soon according to the new redesign in #637 and #719

AbdullahAlfaraj avatar Oct 11 '22 02:10 AbdullahAlfaraj

Super nice PR :heart_eyes:

@anton-l @patil-suraj does any of you have time to dive into this? Otherwise happy to take a look

patrickvonplaten avatar Oct 11 '22 18:10 patrickvonplaten

Really cool @AbdullahAlfaraj so nice to see someone working on this!

I've also been in the process of adding Euler ancestral, regular euler and the other dpm solvers and was curious about something in your implementation - hoping you can shed some light.

I could be wrong but it seems that you omit the CFGDenoiser for the preconditioning of the denoiser model, and instead substitute for a different formulation of a discrete noise schedule that you use to precondition with.

Wondering why you made this decision? And if there are any resources/papers on this this we could check out? It seems to deviate a bit from k-diffusers.

Thanks!

tonetechnician avatar Oct 13 '22 22:10 tonetechnician

Let us know if you need help @AbdullahAlfaraj :-)

patrickvonplaten avatar Oct 14 '22 18:10 patrickvonplaten

Hi, @tonetechnician

I could be wrong but it seems that you omit the CFGDenoiser for the preconditioning of the denoiser model, and instead substitute for a different formulation of a discrete noise schedule that you use to precondition with.

I couldn't find the Euler Ancestral Algorithm described in https://arxiv.org/abs/2206.00364 paper so I decided to adapt the implementation from the https://github.com/crowsonkb/k-diffusion repo. As for the CFGDenoiser I checked the deforum stable diffusion implementation: https://github.com/deforum/stable-diffusion/blob/main/helpers/k_samplers.py

class CFGDenoiser(nn.Module):
    def __init__(self, model):
        super().__init__()
        self.inner_model = model

    def forward(self, x, sigma, uncond, cond, cond_scale):
        x_in = torch.cat([x] * 2)#A# concat the latent
        sigma_in = torch.cat([sigma] * 2) #A# concat sigma
        cond_in = torch.cat([uncond, cond])
        uncond, cond = self.inner_model(x_in, sigma_in, cond=cond_in).chunk(2)
        return uncond + (cond - uncond) * cond_scale #classifier-free guidance

I'm not sure which part of this code I've missed.

Wondering why you made this decision? And if there are any resources/papers on this this we could check out? It seems to deviate a bit from k-diffusers.

I'm mostly following deforum/stable-diffusion repo. from what I understand is that they wrap the diffusion model in a DiscreteSchedule class that convert the noise levels from continues to discreate time-signal. you can find it here: https://github.com/crowsonkb/k-diffusion/blob/f4e99857772fc3a126ba886aadf795a332774878/k_diffusion/external.py

I tried to follow the implementation very closely, though I could have messed up something. For all it's worth, when you inference stable diffusion using Euler Ancestral scheduler you get similar results, in both Deforum and Diffusers.

I've also been in the process of adding Euler ancestral, regular euler and the other dpm solvers and was curious about something in your implementation - hoping you can shed some light.

that's awesome, if you make a PR I won't mind closing this one, and continue the development on your branch

AbdullahAlfaraj avatar Oct 14 '22 23:10 AbdullahAlfaraj

Hi @patrickvonplaten I'm facing a couple of issues trying to get the Euler ancestral scheduler to be used in the same way as the rest of the schedulers. I will write about it soon.

AbdullahAlfaraj avatar Oct 15 '22 00:10 AbdullahAlfaraj

Thanks for the feedback @AbdullahAlfaraj!

I think we're on the same page here now. I wasn't sure why the CFGDenoiser was commented out in your code, but after searching through crowson and deforum's repo I saw where you were getting those values. I think we're on a similar path, I've just been integrating more closely with the k_diffusion library by wrapping their functions rather than pulling them out and adding to the scheduler.

I found that k_diffusion seemed to default to dpm solvers + the karras paper's noise schedule. I went through a process last night trying out the different noise schedules in the paper you referenced to no good avail which is likely to my bad CFGDenoiser implementation. I do think there is room to provide an option to do different noise schedule generation for diffusers down the line though.

Haven't reached a point where I'm happy to make a PR yet, so I will continue my side and feedback if I make some progress. Nice to be working in tandem with you on this though because we can discuss it a bit more

tonetechnician avatar Oct 15 '22 08:10 tonetechnician

See Katherine's note on https://github.com/huggingface/diffusers/issues/277#issuecomment-1279452625

k-diffusion's Euler and Euler Ancestral samplers are just the VE versions of DDIM and the original DDPM sampling method though, they should actually produce the same outputs as their VP counterparts if you convert a VP diffusion process to VE...

diffusers implements DDPM's method in https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddpm.py

What distinguishes the scheduler in this PR from that one?

keturn avatar Oct 15 '22 19:10 keturn

FYI to the thread, my colleague just showed me that there is already an 0.3.0 (incorrectly labelled 0.5.1) implementation of diffusers that incorporates these k schedulers done by @hlky. Check it out. Going to test it now

https://github.com/hlky/diffusers/tree/main

tonetechnician avatar Oct 16 '22 07:10 tonetechnician

There are seemingly a few changes to make with the current diffusers Scheduler Mixins and new API. But have started this process. After I get somewhere I'd be happy to make a PR that has these changes.

tonetechnician avatar Oct 16 '22 08:10 tonetechnician

@tonetechnician (incorrectly labelled 0.5.1) it was 0.3.0 when I did it, I'm not sure where you're seeing it labelled as 0.5.1

@hafriedlander had been using my implementation of k_euler and k_euler_a in stable-diffusion-grpcserver. These 2 were already producing 1:1 result.

k_dpm_2, k_dpm_2_a and k_heun were not quite right because in diffusers the call to unet is done in the pipeline, however these samplers do a second pass (this is why they take twice as long

k_heun model() model_output

k_dpm_2 model() model_output

k_dpm_2_a model() model_output

k_dpm_2_a and k_heun are now producing 1:1 result; we still need to check what's going on with k_dpm_2.

As soon as there is a complete implementation that produces 1:1 results we will submit a pull request.

If anyone has any questions or wants to help, feel free to reach out to me or @hafriedlander.

(below is just some info on what is equivalent between crownsonkb's implementation and diffusers') s_in = x.new_ones([x.shape[0]]) = model_output x = sample i = timestep sigmas[i] = sigma or self.sigmas[timestep] denoised = model(x, sigmas[i] * s_in, **extra_args) = pred_original_sample = sample - sigma * model_output

x_2 = x + d * dt_1
denoised_2 = model(x_2, sigma_mid * s_in, **extra_args)

=

sample_2 = sample + derivative * dt_1
pred_original_sample_2 = sample_2 - sigma_mid * model_output

hlky avatar Oct 16 '22 09:10 hlky

Thanks so much for the response @hlky!

it was 0.3.0 when I did it, I'm not sure where you're seeing it labelled as 0.5.1

This is just in the README, currently it's referencing the huggingface diffusers version badge https://github.com/hlky/diffusers/blob/main/README.md?plain=1#L11

Really promising to hear the 1:1 output from your implementation and the original. Going to be playing with it a bit this evening. I'm happy to help port it to 0.5.1.

As soon as there is a complete implementation that produces 1:1 results we will submit a pull request.

I think then I'll fork your repo and start making changes there. Happy to help where I can. Will also have a look at dpm_2 and was also considering implementing the karras noise schedule for completeness.

tonetechnician avatar Oct 16 '22 14:10 tonetechnician

@hafriedlander finished off the implementation earlier

k_euler, k_euler_a, k_dpm_2, k_dpm_2_a and k_heun are 1:1 result

https://github.com/hafriedlander/stable-diffusion-grpcserver/tree/main/sdgrpcserver/pipeline/old_schedulers

hlky avatar Oct 16 '22 15:10 hlky

Amazing! Defs gonna try this.

I've just merged 0.5.1 with your fork and testing it out to make sure everything works. Will incorporate these here and if all goes well, can make a PR to your repo

tonetechnician avatar Oct 16 '22 16:10 tonetechnician

@tonetechnician @hlky's fork is pretty old now. My repo is based on Diffusers 0.4.2, so upgrading shouldn't be too painful.

However I've backported 0.3.0-style scheduler support into my repo to continue to support these schedulers. It's probably the main blocker to raising a PR here - they're all still NumPy based and need specific handling.

I'll probably raise a PR for discussion anyway, but updating the schedulers to be more 0.4.2-style would be helpful. I'm not quite sure how to handle scale_model_input yet.

hafriedlander avatar Oct 16 '22 20:10 hafriedlander

Thanks for the info @hafriedlander.

My repo is based on Diffusers 0.4.2, so upgrading shouldn't be too painful.

Indeed, it seems alright I think. I've started porting your 0.4.2 versions to 0.5.1 but notice the output isn't quite correct. I need to dig a bit more into why exactly. I get output like this (prompt is "A dog smiling"). Might be to do with precision types or something like this.

output

However I've backported 0.3.0-style scheduler support into my repo to continue to support these schedulers. It's probably the main blocker to raising a PR here - they're all still NumPy based and need specific handling.

I see! Yes, I've now converted the schedulers you've implemented to be torch specific and this seems to be working fine (minus the strange image output)

I'll probably raise a PR for discussion anyway, but updating the schedulers to be more 0.4.2-style would be helpful. I'm not quite sure how to handle scale_model_input yet.

Awesome, I think this would be great. As for scale_model_input, it's not too bad. I've adopted the method used by other schedulers that don't scale input:

    def scale_model_input(self, sample: torch.FloatTensor, timestep: Optional[int] = None) -> torch.FloatTensor:
        """
        Ensures interchangeability with schedulers that need to scale the denoising model input depending on the
        current timestep.

        Args:
            sample (`torch.FloatTensor`): input sample
            timestep (`int`, optional): current timestep

        Returns:
            `torch.FloatTensor`: scaled input sample
        """
        return sample

Please do link the PR here once you've got it. I'm currently working from @hlky's fork, but I've copied your implementation of the schedulers over there already.

We will also need to work on the Img2Img pipeline too as that currently is not working correctly. Busy on that at the moment. After that is sorted I'm planning to complete the upgrade to 0.5.1

tonetechnician avatar Oct 16 '22 20:10 tonetechnician

If you're using my update of the schedulers, but not my unified_pipeline, it probably won't work. (That's my guess for why camo-dog).There's quite a few special handling cases in the pipeline (the schedulers need to re-call the UNet, and pass custom sigma values)

hafriedlander avatar Oct 16 '22 21:10 hafriedlander

Great to know, thanks for pointing it out.

I was reading through both yours and @hlky's implementation and see there are a notable changes to accommodate the schedulers. But so awesome that you guys have got it going! I think porting to 0.5.1 isn't going to be difficult.

It is quite late where I am now so I'll pick it up first thing in the morning with a fresh mind and get it working.

tonetechnician avatar Oct 16 '22 21:10 tonetechnician

Just a quick update, I have got the new schedulers incorporated in into the other pipelines from @hlky's repo and been testing and so far so good!

Now going to complete the update to 0.5.1. Once that's done I'll make a PR to the diffusers base repo and link it here.

But looking good. Next thing will be getting tests going.

tonetechnician avatar Oct 17 '22 16:10 tonetechnician

That's awesome @tonetechnician & @AbdullahAlfaraj Let us know if you need any help with the PR, very happy to help :)

patil-suraj avatar Oct 20 '22 12:10 patil-suraj

Also, feel free to already open a PR if you want :) Pretty excited to try it out !!

patil-suraj avatar Oct 20 '22 13:10 patil-suraj

Thanks! @patil-suraj

Had a small hiccup with some of the implementation and have made an issue #901 and speaking with @patrickvonplaten. But I'll make a PR later today with where I got to so far. Getting output in 0.5.1 that looks reasonable but need more testing as there does seem to be some changes.

As both @hlky and @hafriedlander have mentioned in other posts, I do see that for the DPM schedulers we do need to pipe the model back into scheduler, so for them to work with the base diffusers pipelines, we'll probably need to make some adjustments to how this works which can be discussed in the PR. Don't think it should be too difficult to add that in a non-destructive way

tonetechnician avatar Oct 21 '22 08:10 tonetechnician

Hey @tonetechnician,

This is a great PR and we would very much like to help/unblock you if you have any questions! Please let me know if anything is not clear or you need help. Regarding wrapping the model and the sampler, we have quite a string requirement to not pass the model into a scheduler function - we could however first call a scheduler function if necessary. Feel free to ask about any specific design questions, I'm more than happy to help :-)

patrickvonplaten avatar Oct 24 '22 20:10 patrickvonplaten

@patrickvonplaten tricky to work around that constraint

You can see here: https://github.com/hafriedlander/stable-diffusion-grpcserver/blob/776502981a872ddaa2f6b7c2ec9a58adf0154372/sdgrpcserver/pipeline/old_schedulers/scheduling_dpm2_discrete.py#L168

The scheduler calculates a predicted state, and then re-calculates the noise prediction for the next step based on that intermediate state.

The other way to achieve that could be the scheduler returning some sort of flag to the model that it's not done, and needs re-calling with some (opaque) state object, plus the result of a second call to the unet?

Pseudo-code

scheduler_state = None
do:
  prediction = unet(latent)
  guidance
  done, scheduler_state, lantents = scheduler(latents, prediction, scheduler_state)
while not done

Or I guess the scheduler could lie about the timesteps, and double them up to just cause itself to be called twice, remembering it's own state internally. That might even be better? There's no branching logic in the scheduler (except for handling the last step where there is no next step to re-call on).

hafriedlander avatar Oct 24 '22 21:10 hafriedlander

@patil-suraj do you want to take a look here?

patrickvonplaten avatar Nov 04 '22 18:11 patrickvonplaten

@patil-suraj do you want to take a look here?

On my todo-list, will get back to this as soon as possible.

patil-suraj avatar Nov 08 '22 13:11 patil-suraj