stable-diffusion
stable-diffusion copied to clipboard
Evaluation issuse
Thank you for your great work! But i have some question when calculating text-to-image fid. In the appendix <E.3.2 Text-to-Image Synthesis>, I notice that you said that fid is calculated by comparing generated samples with 30000 samples from the validation set of the MS-COCO dataset. But in the paper <Zero-Shot Text-to-Image Generation> which you followed, the fid is calculated by comparing 30000 generated samples with the validation set of the MS-COCO dataset.
What is the difference?