diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Add `v`-prediction

Open pcuenca opened this issue 3 years ago • 4 comments

Like in the Imagen Video paper. According to Katherine Crowson it's more stable and faster: https://twitter.com/rivershavewings/status/1578193039423852544?s=21&t=oIToMoHbtU1U8VH3q4Mzng

pcuenca avatar Oct 08 '22 03:10 pcuenca

Any ideas how we should add it ? Guess we should try it out in a training run no? Maybe simple DDPM training? Also, I'm not sure if anybody has time soon to try this out. Is someone in the community interested in this maybe?

patrickvonplaten avatar Oct 10 '22 12:10 patrickvonplaten

Also cc @borisdayma

patrickvonplaten avatar Oct 10 '22 12:10 patrickvonplaten

Maybe there could be something like a prediction_type or objective which defaults to epsilon (still probably the standard) but can also be set to x or v.

borisdayma avatar Oct 10 '22 14:10 borisdayma

@borisdayma when I first saw the conversation that's the API I thought of. This could be my next thing after RL / unet1d.

natolambert avatar Oct 10 '22 21:10 natolambert

Hey! I am not sure if @natolambert or someone else will work on this one, but if not, I'd like to take it

LiviaCavalcanti avatar Oct 12 '22 01:10 LiviaCavalcanti

This is especially useful for things like progressive distillation (https://arxiv.org/abs/2202.00512) where they found predicting the noise (epsilon) had some issues, and instead recommend either:

  • Predicting x directly
  • Predicting both via separate channels and smoothly interpolating between them based on the noise level (I think equivalent to the c_skip trick that Karras et al do)
  • Predicting the v objective

Having just a choice of epsilon, v and x seems like a good place to start, and maybe more complex custom ones can be something we think about a little later if that turns out to be necessary.

johnowhitaker avatar Oct 12 '22 06:10 johnowhitaker

There will have to be some additional logic in all of the samplers to handle models with different objectives, but given any one of the three listed you can derive the other two as long as you know the noise level.

johnowhitaker avatar Oct 12 '22 06:10 johnowhitaker

I will post here if / when I find the time to start it, but feel free to start @LiviaCavalcanti. I have a couple other things I need to finish first.

natolambert avatar Oct 12 '22 15:10 natolambert

Are there any models on the hub that would make good examples for this? Or is it better if we train a small one in a colab to showcase the effects of different prediction modes?

natolambert avatar Oct 12 '22 15:10 natolambert

@natolambert I will work on your showcase suggestion, so we have an idea what to do next, ok? I'm open to others suggestions

LiviaCavalcanti avatar Oct 13 '22 02:10 LiviaCavalcanti

Some of the schedulers need sigma's. We'll need a lot of tests. The flax models will need a bit different implementation too to make it jit friendly. I was briefly studying these libraries.

@LiviaCavalcanti

natolambert avatar Oct 13 '22 03:10 natolambert

Hi, I'm interested in helping out. Is this task already taken? If not, could I work on this task?

aandyw avatar Oct 14 '22 16:10 aandyw

#818 is not complete. There are many things to do:

  • test the implementation (preferably in a colab we can look at / contribute to too)
  • tests in the library
  • implementing this in other schedulers too

Feel free to open a PR on my PR if you get ahead, or keep discussing here!

natolambert avatar Oct 14 '22 18:10 natolambert

Hey @natolambert! Sorry for the delay I had many setbacks last week. I have a sketch of the solution. I wish you could comment on that but i don't know how since there's so much work left to open a PR.

LiviaCavalcanti avatar Oct 21 '22 01:10 LiviaCavalcanti

@LiviaCavalcanti best bet is to open a standalone PR or open a PR against my PR #818 , thats the only way for us to give feedback in public :). Link it to this issue too if you can.

natolambert avatar Oct 21 '22 02:10 natolambert

Just wanted to update everyone in the issue that @bglick13 mostly and some of I made a lot of progress on this in #818 -- feel free to take a look.

natolambert avatar Nov 17 '22 23:11 natolambert

@pcuenca we can close this issue, no?

williamberman avatar Feb 13 '23 05:02 williamberman