latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

How to reproduce result of FID 3.60 over LDM-4-G on ImageNet?

Open ThisisBillhe opened this issue 1 year ago • 4 comments

  1. Should I use cin256-v2.yaml for LDM-4?
  2. The paper only mentioned scale=1.5 and step=250 but didn't mention the eta for this result. I tried using eta=0 (IS=115, FID=5.04) and eta=1 (IS=157, FID=4.65). What eta should I use to reproduce a result of FID 3.60?

ThisisBillhe avatar Apr 12 '23 12:04 ThisisBillhe

  1. Yes
  2. I successfully reproduce the result by uniformally generate 50 images from each class. Results is shown below(IS by torch-fidelity, others by guided_diffusion evaluation code. The paper claims that FID calculated by both tools is almost coincide):
cfg with step=250, scale=1.5 IS↑ FID↓ sFID↓ Prec.↑ Recall↑
Paper reported in table 10 247.67±5.59 3.60 - 87% 48%
eta=0 205.55±5.27 3.31 5.10 82.95% 53.57%
eta=1 249.59±3.30 3.54 5.10 87.15% 48.50%

Thus I guess the eta used in paper is 1.

Jiang-Stan avatar Jul 14 '23 06:07 Jiang-Stan

  1. Yes
  2. I successfully reproduce the result by uniformally generate 50 images from each class. Results is shown below(IS by torch-fidelity, others by guided_diffusion evaluation code. The paper claims that FID calculated by both tools is almost coincide):

cfg with step=250, scale=1.5 IS↑ FID↓ sFID↓ Prec.↑ Recall↑ Paper reported in table 10 247.67±5.59 3.60 - 87% 48% eta=0 205.55±5.27 3.31 5.10 82.95% 53.57% eta=1 249.59±3.30 3.54 5.10 87.15% 48.50% Thus I guess the eta used in paper is 1.

Hi. Thanks for your reply. I have another question. Did you use the reference batch provided by guided diffusion? Does this influence the FID results compared to evaluating with the entire imagenet training dataset?

ThisisBillhe avatar Nov 15 '23 03:11 ThisisBillhe