stable-diffusion
stable-diffusion copied to clipboard
How to compute the FID of text-image models?
I'm curious about what dataset does FIDs shown in CLIP-FID curve computed with? And are the text-image FIDs reported in https://arxiv.org/abs/2112.10752 be computed use one of these four models? I would be very grateful if someone can tell me~