diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Clean up 1d unet blocks

Open natolambert opened this issue 2 years ago • 9 comments

Some changes motivated by #105.

natolambert avatar Oct 26 '22 22:10 natolambert

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@patrickvonplaten I looked really closely at this and I don't see a reason that dance_diffusion test is failing. Do you agree? Hmmmm... testing is really tricky here.

natolambert avatar Oct 26 '22 23:10 natolambert

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

HuggingFaceDocBuilder avatar Nov 14 '22 22:11 HuggingFaceDocBuilder

Two todo's in order to merge this:

  1. update the assertion test in dance diffusion, as it's pretty clear the underlying model logic is the same (multiple folks have checked it)
  2. update the config for the RL models on the hub to get the mid_layer functionality in. @bglick13 is this easy?

natolambert avatar Nov 14 '22 22:11 natolambert

update the config for the RL models on the hub to get the mid_layer functionality in. @bglick13 is this easy?

Done!

bglick13 avatar Nov 17 '22 18:11 bglick13

@patrickvonplaten FYI I checked that the harmoni ai slow test passes. I changed the values of the fast dance_diffusion_pipeline test, which I think is needed because pytorch generates values differently for the changes to the modules for variable layer amounts (not 100% sure, it's a little tricky).

Will see if all the tests pass now 😎. We could do another PR where we change the num_layers to index in a way that's more intuitive across UNet1D/2D.

natolambert avatar Nov 17 '22 18:11 natolambert

Sorry, overall I'm still a bit confused why we need this PR. Is it just to make the code cleaner or do we need it for a new model?

If it's just to make the code cleaner I'm not really in favor of it actually because:

  • We add new config parameters which makes the config longer, harder to understand and less flexible for future changes
  • The code becomes less flexible in a sense that now if one wants different kinds of layers in the Midblock, the whole logic of layers_per_block break a bit no? I know we use layers_per_block for the UNet2D but that's really because we support a variety of different model types

=> Long story short, if we need this change to support a new model, I'm 100% on-board with this (maybe not putting layers_per_block in the mid block, but the rest ok!), but if we don't need these changes for a new model, I don't think we should do them now.

patrickvonplaten avatar Nov 18 '22 09:11 patrickvonplaten

@patrickvonplaten this isn't associated wit ha specific model.

My thoughts:

  • I feel like the layers_per_block arg is just really useful for people trying to build new stuff in the library. I personally would rather it already exist than have to add a feature like that when adding a new model type.
  • I feel less strongly about the mid blocks arg -- that one can wait until there's a clearer need.
  • I think at minimum the renaming of the downsample blocks should be merged. The difference between one with learned convolutions and one with a static kernel should be clear.
  • I see why it looks like a breaking change for the layers_per_block arg. This arg was added in the RL pr, so dance diffusion goes to default (which the defaults in unet1d.py match). Convenient that it also works for the RL UNet too.

Given this, happy to roll back the mid block layers. Does that sound good?

natolambert avatar Nov 18 '22 23:11 natolambert

@patrickvonplaten I made this non breaking changes and more intuitive. The dance diffusion test confuses me because none of the real logic changes, it seems like a weight initialization issue, or we need to run the GH actions on an a100? Not sure.

natolambert avatar Nov 30 '22 00:11 natolambert

@natolambert the failing tests seem to be related

patrickvonplaten avatar Dec 01 '22 17:12 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 26 '22 15:12 github-actions[bot]