improved-diffusion icon indicating copy to clipboard operation
improved-diffusion copied to clipboard

EMA sampling produces only noise

Open NicolasNerr opened this issue 2 years ago • 9 comments

Hi,

I am training on a dataset of 500 images (I know it's not a lot), with diffusion steps=4000 and basic parameters proposed in the read me.

Using the first model checkpoint at 10K steps, the sampling produces only noise. (with the same diffusion steps or 250 or 1000, it doesn't change anything). However, the training loss is aloready at 0.03.

Do you have an idea of what is happening here ?

An automatic sampling every X training steps, would be greatly appreciated to avoid this king of surprising issues.

Thanks a lot !

NicolasNerr avatar Nov 09 '22 16:11 NicolasNerr

Where did you find the checkpoint? Is it created during training or once it finished training? There should be a /temp directory, but even after 45000 steps it did not create any (or in which directory did you find it?)

PatrickSVM avatar Nov 11 '22 18:11 PatrickSVM

I found the checkpoint in the tmp/openai-XXX/ corresponding to the correct train. I tried sampling from the 10000 and 20000 steps models.

NicolasNerr avatar Nov 15 '22 16:11 NicolasNerr

Actually it's seems it's with the ema version of the model for sampling that produces only noisy results.

NicolasNerr avatar Nov 17 '22 15:11 NicolasNerr

Did you obtain good samples when using the non-ema models?

JesseWiers avatar Jan 20 '23 12:01 JesseWiers

So basically: EMA is the exponential moving average of the parameters, then it makes sense that the first few thousand EMA models just produce noise. The models called models just have the most frequently updated copy of parameters which is they show results at an earlier tilmestep.

PatrickSVM avatar Jan 20 '23 14:01 PatrickSVM

i get the same problem,Can you explain the reason,thank you

pfeducode avatar Mar 07 '23 05:03 pfeducode

As I said look up EMA, the parameters of the exponential moving average will change very slowly compared to the „model“ parameters. Fix: train (far) longer. Or sample from model with „model“ weighs instead of ema

PatrickSVM avatar Mar 07 '23 07:03 PatrickSVM

正如我所说的查看 EMA,与“模型”参数相比,指数移动平均线的参数变化非常缓慢。修复:训练(远)更长的时间。或者从具有“模型”重量而不是 ema 的模型中采样

I have tried both model and EMA model, and the results of both are noise points,There is also an opt model. I don't know what its function is。

pfeducode avatar Mar 07 '23 11:03 pfeducode

opt is not a model, it is the optimizer state if you want to restart training

PatrickSVM avatar Mar 07 '23 12:03 PatrickSVM