diffusers
diffusers copied to clipboard
Add `v`-prediction
Like in the Imagen Video paper. According to Katherine Crowson it's more stable and faster: https://twitter.com/rivershavewings/status/1578193039423852544?s=21&t=oIToMoHbtU1U8VH3q4Mzng
Any ideas how we should add it ? Guess we should try it out in a training run no? Maybe simple DDPM training? Also, I'm not sure if anybody has time soon to try this out. Is someone in the community interested in this maybe?
Also cc @borisdayma
Maybe there could be something like a prediction_type or objective which defaults to epsilon (still probably the standard) but can also be set to x or v.
@borisdayma when I first saw the conversation that's the API I thought of. This could be my next thing after RL / unet1d.
Hey! I am not sure if @natolambert or someone else will work on this one, but if not, I'd like to take it
This is especially useful for things like progressive distillation (https://arxiv.org/abs/2202.00512) where they found predicting the noise (epsilon) had some issues, and instead recommend either:
- Predicting x directly
- Predicting both via separate channels and smoothly interpolating between them based on the noise level (I think equivalent to the c_skip trick that Karras et al do)
- Predicting the v objective
Having just a choice of epsilon, v and x seems like a good place to start, and maybe more complex custom ones can be something we think about a little later if that turns out to be necessary.
There will have to be some additional logic in all of the samplers to handle models with different objectives, but given any one of the three listed you can derive the other two as long as you know the noise level.
I will post here if / when I find the time to start it, but feel free to start @LiviaCavalcanti. I have a couple other things I need to finish first.
Are there any models on the hub that would make good examples for this? Or is it better if we train a small one in a colab to showcase the effects of different prediction modes?
@natolambert I will work on your showcase suggestion, so we have an idea what to do next, ok? I'm open to others suggestions
Some of the schedulers need sigma's. We'll need a lot of tests. The flax models will need a bit different implementation too to make it jit friendly. I was briefly studying these libraries.
@LiviaCavalcanti
Hi, I'm interested in helping out. Is this task already taken? If not, could I work on this task?
#818 is not complete. There are many things to do:
- test the implementation (preferably in a colab we can look at / contribute to too)
- tests in the library
- implementing this in other schedulers too
Feel free to open a PR on my PR if you get ahead, or keep discussing here!
Hey @natolambert! Sorry for the delay I had many setbacks last week. I have a sketch of the solution. I wish you could comment on that but i don't know how since there's so much work left to open a PR.
@LiviaCavalcanti best bet is to open a standalone PR or open a PR against my PR #818 , thats the only way for us to give feedback in public :). Link it to this issue too if you can.
Just wanted to update everyone in the issue that @bglick13 mostly and some of I made a lot of progress on this in #818 -- feel free to take a look.
@pcuenca we can close this issue, no?