diffusers Add missing get_velocity function to DEISMultistepScheduler

Add the missing get_velocity function (required for v-prediction) from DDPMScheduler from #2351

This allows DEISMultistepScheduler to be a drop in replacement for DDPMScheduler.

Feb 14 '23 16:02 saunderez

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Feb 14 '23 16:02 HuggingFaceDocBuilderDev

Thanks for adding this @saunderez I think it looks good - do you want to run make quality so the code quality checks can pass?

@patrickvonplaten @williamberman can we merge this in? otherwise can't use v_prediction in DEISMultistepScheduler during training

Feb 15 '23 03:02 yiyixuxu

Thanks for adding this @saunderez I think it looks good - do you want to run make quality so the code quality checks can pass?

I've tried but unfortunately I'm having a little bit of trouble with that....will try again when I get home from work

Feb 15 '23 04:02 saunderez

All ready to go @yiyixuxu !

Happy to be a contributor for once, hopefully there'll be more to come :)

Feb 15 '23 06:02 saunderez

Hey @saunderez,

Sorry, could you clarify the use case a bit?

There should really be a use case where there is no need to use the DEISMultistepScheduler instead of DDPMScheduler during training, e.g. we always use the DDPMScheduler in our examples: https://github.com/huggingface/diffusers/blob/e3ddbe25edeadaa5afc3f8f5bb0d645098a8b26a/examples/dreambooth/train_dreambooth.py#L617 and all those trained unets are then compatible with the DEIS scheduler during inference.

Feb 16 '23 10:02 patrickvonplaten

We have been testing the usage of DEIS as an alternative scheduler for training in the dev branch the Dreambooth extension for Automatic1111 WebUI. It is currently a user selectable toggle.

https://github.com/d8ahazard/sd_dreambooth_extension

While I wouldn't describe the extension as cutting edge the sd_dreambooth_extension community does not shy away from testing implementations with potential benefit such as LORA and DEIS. LION was pushed to Dev as a potential replacement for AdamW today and early results are extremely promising and I believe that we can improve training quality and the process in general by using state of of the art tech when possible.

Back to the pull request itself....

DEIS was implemented in dev for testing not long after release and several members of the Discord had success a with 1.5 based models opinion was favourable. As someone who trains 2.1 768 almost exclusively I was the first to encounter the missing function and it was immediately obvious that a simple copy and paste would allow me to continue testing.

Since implementing it on my codebase I have trained several models successfully. No AB analysis has been done so far but I have definitely noticed higher quality samples at low steps and I think that alone warrants implementation of get_velocity for training.

On Thu, 16 Feb 2023, 20:02 Patrick von Platen, @.***> wrote:

Hey @saunderez https://github.com/saunderez,

Sorry, could you clarify the use case a bit?

There should really be a use case where there is no need to use the DEISMultistepScheduler instead of DDPMScheduler during training, e.g. we always use the DDPMScheduler in our examples: https://github.com/huggingface/diffusers/blob/e3ddbe25edeadaa5afc3f8f5bb0d645098a8b26a/examples/dreambooth/train_dreambooth.py#L617 and all those trained unets are then compatible with the DEIS scheduler during inference.

— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/pull/2352#issuecomment-1432825429, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVPV2L4HQOVROQ5E3MQ4FLWXX3J5ANCNFSM6AAAAAAU3YUXPY . You are receiving this because you were mentioned.Message ID: @.***>

Feb 16 '23 10:02 saunderez

To further add color to @saunderez use-case, we've found that using DEIS in place of DDIM/DDPM for the noise sampler improves training speed without any affect to output quality, but without v-pred, it breaks V2.1 models.

Feb 16 '23 13:02 d8ahazard

I see, would love a second opinion here from @pcuenca @patil-suraj @williamberman

Feb 16 '23 18:02 patrickvonplaten

To further add color to @saunderez use-case, we've found that using DEIS in place of DDIM/DDPM for the noise sampler improves training speed without any affect to output quality, but without v-pred, it breaks V2.1 models.

Thanks for the extra detail :)

Feb 22 '23 21:02 saunderez

Gently ping again @patil-suraj @williamberman and @pcuenca

Mar 03 '23 18:03 patrickvonplaten

I have no experience training with DEIS, but the PR looks fine to me. The change is isolated to this use case and shouldn't impact anything, unless the user manually configures their training scripts to use it.

I can try to setup a couple of dreambooth runs today.

Mar 06 '23 08:03 pcuenca

I'm a bit confused, I believe the "noise schedule" is the same regardless of the Scheduler class. I don't think we get anything out of adding this method to other schedulers.

This is where the diffusers naming can get confusing, sorry about this. We have several Scheduler classes which each contain a "noise schedule" and a set of methods for solving for the next step in the denoising process given what the model outputs. In the training scripts, the noise_scheduler is only used for its noise schedule and not for any of the solving methods. This makes sense as there's no iterative denoising process during training.

The noise schedule is configured by num_train_timesteps, beta_start, beta_end, beta_schedule, and trained_betas which are always used the same way.

The "solving methods" are configured by variance_type and prediction_type.

Maybe there's something about DEISMultistepScheduler that I'm missing? Also using an alternative noise_scheduler for training scripts would mean we'd have to make the class configurable somehow. I.e. just adding the get_velocity method isn't sufficient

Mar 06 '23 19:03 williamberman

@williamberman @saunderez As I understand this, the get_velocity function should indeed be the same for all Scheduler (sampler) classes given the same noise schedule parameters, but must be implemented to yield a training target to train models that use v-prediction. In fact, it may be good to

Mark get_velocity with the # Copied from mechanism to propagate the implementation to all Scheduler classes that define self.alphas_cumprod consistent with the original DDPM stochastic process formulation for the time being. Same goes for the Scheduler classes that define self.sigmas consistent with the SDE/ODE formulation in Karras. et. al. 2022.
Refactor noise schedules into a separate class, that is then passed as a Scheduler constructer argument as a long-term goal.

Mar 10 '23 05:03 clarencechen

@williamberman @saunderez As I understand this, the get_velocity function should indeed be the same for all Scheduler (sampler) classes given the same noise schedule parameters, but must be implemented to yield a training target to train models that use v-prediction. In fact, it may be good to

Mark get_velocity with the # Copied from mechanism to propagate the implementation to all Scheduler classes that define self.alphas_cumprod consistent with the original DDPM stochastic process formulation for the time being. Same goes for the Scheduler classes that define self.sigmas consistent with the SDE/ODE formulation in Karras. et. al. 2022.

Refactor noise schedules into a separate class, that is then passed as a Scheduler constructer argument as a long-term goal.

Exactly right, and great idea. Without it schedulers aren't fully compatible with v-prediction, and it's such an easy win.

We've monkey patched it in for our purposes to implement DEIS and have since done the same to implement UniPCMultiStepScheduler which is also lacking get_velocity (and is now my go to for training coz speeed!)

Considering the outstanding work done to provide standardised interfaces in Diffusers the inconsistency here is a bit out of place.

Mar 10 '23 16:03 saunderez

To further add color to @saunderez use-case, we've found that using DEIS in place of DDIM/DDPM for the noise sampler improves training speed without any affect to output quality, but without v-pred, it breaks V2.1 models.

@d8ahazard could you help explain where it breaks? Are you suggesting it our text_to_image example breaks if one used the v2-1 checkpoint as the base model?

Mar 16 '23 10:03 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 09 '23 15:04 github-actions[bot]

To further add color to @saunderez use-case, we've found that using DEIS in place of DDIM/DDPM for the noise sampler improves training speed without any affect to output quality, but without v-pred, it breaks V2.1 models.

@d8ahazard could you help explain where it breaks? Are you suggesting it our text_to_image example breaks if one used the v2-1 checkpoint as the base model?

When training, if we use DEIS for the noise sampler without monkey-patching in v-prediction.

Here, I believe:

https://github.com/d8ahazard/sd_dreambooth_extension/blob/926ae204ef5de17efca2059c334b6098492a0641/dreambooth/train_dreambooth.py#L1151

Apr 09 '23 15:04 d8ahazard

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 06 '23 15:05 github-actions[bot]

BUMP.

Can we please merge this already? I'm tired of having the monkey patch it in my code. Lol.

On Sat, May 6, 2023, 10:03 AM github-actions[bot] @.***> wrote:

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md are likely to be ignored.

— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/pull/2352#issuecomment-1537160411, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMO4NFVOZBKFRGTLQO5TNDXEZR3PANCNFSM6AAAAAAU3YUXPY . You are receiving this because you were mentioned.Message ID: @.***>

May 06 '23 15:05 d8ahazard

We will need to solve the quality check here before merging this. Nevertheless I also recommend to simply use the DDPMScheduler for training (add_noise) and then use DEISScheduler for inference. Schedulers can be switched in & out on the fly:

scheduler = DDPMScheduler.from_config(pipeline.scheduler.config)
scheduler = DEISScheduler.from_config(pipeline.scheduler.config)

May 08 '23 08:05 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jun 01 '23 15:06 github-actions[bot]

diffusers diffusers copied to clipboard

Add missing get_velocity function to DEISMultistepScheduler

diffusers
diffusers copied to clipboard