denoising-diffusion-pytorch icon indicating copy to clipboard operation
denoising-diffusion-pytorch copied to clipboard

Why are ema updates not reflected on the online model?

Open tuttyfrutyee opened this issue 3 years ago • 3 comments

As far as I understood from the code, the ema updates are affecting only the final data sampler rather than the online model that is being trained.

If the assumption and the observations are that EMA at the end (and most probably after some epochs starting the training) is better for data sampling, why not update the online model now and then to leverage better weights of EMA?

tuttyfrutyee avatar Jul 15 '22 20:07 tuttyfrutyee

I don't get your question... Why EMA affects data? Isn't it only tuning the model params?

pengzhangzhi avatar Oct 18 '22 03:10 pengzhangzhi

Yes, it tunes the model parameters, but with the current setup, it is applied (model weights being updated) only at the end of the training to obtain higher-quality images. What I am asking is, why not apply (update model weights) during training (maybe not at every epoch but at every few epochs)?

tuttyfrutyee avatar Oct 18 '22 10:10 tuttyfrutyee

Hi @tuttyfrutyee, i presume you are referring to this file and these lines which i also report here:

              if accelerator.is_main_process:
                  self.ema.update()

therefore ema.update calls this and the following get executed:

self.update_moving_average(self.ema_model, self.model)

Therefore the ema updates your model (after the specified amount of steps).

I have a problem actually related with the updating of the original model in which maybe @pengzhangzhi or anyone could help me with: i noticed that the images generated during the training, therefore sampled with the ema model are FAR better the the ones generated with the 'real' model loaded after the training completed. For now i solved the issue by loading again the EMA model instead of the actual one, but i was wondering how is that? Should i change the power param in the EMA model or choose a different (smaller) beta or update the model more frequently?

Mat-Po avatar Apr 20 '23 14:04 Mat-Po