latent-diffusion
latent-diffusion copied to clipboard
How to reproduce result of FID 3.60 over LDM-4-G on ImageNet?
- Should I use cin256-v2.yaml for LDM-4?
- The paper only mentioned scale=1.5 and step=250 but didn't mention the eta for this result. I tried using eta=0 (IS=115, FID=5.04) and eta=1 (IS=157, FID=4.65). What eta should I use to reproduce a result of FID 3.60?
- Yes
- I successfully reproduce the result by uniformally generate 50 images from each class. Results is shown below(IS by torch-fidelity, others by guided_diffusion evaluation code. The paper claims that FID calculated by both tools is almost coincide):
cfg with step=250, scale=1.5 | IS↑ | FID↓ | sFID↓ | Prec.↑ | Recall↑ |
---|---|---|---|---|---|
Paper reported in table 10 | 247.67±5.59 | 3.60 | - | 87% | 48% |
eta=0 | 205.55±5.27 | 3.31 | 5.10 | 82.95% | 53.57% |
eta=1 | 249.59±3.30 | 3.54 | 5.10 | 87.15% | 48.50% |
Thus I guess the eta used in paper is 1.
- Yes
- I successfully reproduce the result by uniformally generate 50 images from each class. Results is shown below(IS by torch-fidelity, others by guided_diffusion evaluation code. The paper claims that FID calculated by both tools is almost coincide):
cfg with step=250, scale=1.5 IS↑ FID↓ sFID↓ Prec.↑ Recall↑ Paper reported in table 10 247.67±5.59 3.60 - 87% 48% eta=0 205.55±5.27 3.31 5.10 82.95% 53.57% eta=1 249.59±3.30 3.54 5.10 87.15% 48.50% Thus I guess the eta used in paper is 1.
Hi. Thanks for your reply. I have another question. Did you use the reference batch provided by guided diffusion? Does this influence the FID results compared to evaluating with the entire imagenet training dataset?