latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

Evaluation Codes on COCO dataset

Open canqin001 opened this issue 2 years ago • 9 comments

Dear authors,

I noticed that COCO is an essential benchmark for evaluating text-to-image generation. May I ask for the COCO dataset's evaluation code for computing IS and FID?

Thank you so much!

canqin001 avatar Jun 17 '22 05:06 canqin001

Have you computed FID on coco? I tried evaluating the released model on COCO and got a FID score of 134, which is apparently not correct.

zengxianyu avatar Jun 22 '22 01:06 zengxianyu

I have tried this one (https://github.com/mseitzer/pytorch-fid) to compute FID on COCO. The fid score is around 19 which is still higher than the reported results.

canqin001 avatar Jun 22 '22 18:06 canqin001

I also used this repo. Did you evaluate on coco validation or training set? How many samples did you use? I was not able to get any score close to reasonable 

------------------ Original ------------------ From: canqin001 @.> Date: Wed, Jun 22, 2022 1:18 PM To: CompVis/latent-diffusion @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [CompVis/latent-diffusion] Evaluation Codes on COCO dataset(Issue #88)

zengxianyu avatar Jun 22 '22 18:06 zengxianyu

That is a good question. I forgot the gap between train and val set. On val set, the FID is 19.11 and the train set is 12.79 (between val-text-generated images and train-set ground truth images). It seems matching the reported score in the paper. But I am still shocked by such a large gap.

canqin001 avatar Jun 22 '22 20:06 canqin001

Did you evaluate on the full validation set and training set? The inference is slow so I only ran the model on a small subset

zengxianyu avatar Jun 22 '22 20:06 zengxianyu

Yes. I evaluate the full sets. It takes several hours to go.

canqin001 avatar Jun 22 '22 20:06 canqin001

@canqin001 when evaluating, how to process the GT images for evaluation? only resize each image to 256x256; or resize the shot edge of the image to 256 and center crop it to 256x256, which do you use?

CrossLee1 avatar Jun 27 '22 07:06 CrossLee1

Dear authors,

I noticed that COCO is an essential benchmark for evaluating text-to-image generation. May I ask for the COCO dataset's evaluation code for computing IS and FID?

Thank you so much!

Do you have the ldm pretrained model on coco? I also use the same evaluation code and get a fid larger than 100 on 256 * 256 validation dataset, and I think the reason for that is my ldm is not trained on coco. What about yours?

yumadara avatar Oct 31 '22 15:10 yumadara

@canqin001 when evaluating, how to process the GT images for evaluation? only resize each image to 256x256; or resize the shot edge of the image to 256 and center crop it to 256x256, which do you use?

I have the same question, may I ask have you solved it please?

XingtongGe avatar May 17 '24 12:05 XingtongGe