DALLE-pytorch Adding a eval.py

We're starting to have a few trained dalle models. People are looking at loss and a few examples as evaluation currently. I think we can make evaluation easier and better by coding a eval.py.

It would take as input a evaluation directory in the same format as the training then generate pictures for all the text then compute some metrics. Metrics example:

Human: just look at the output folder and see if it looks good
Clip: compute clip dot product between text and generated image, also compute between text and real image, compute the ratio, do the average on the whole eval set
Fid : compute the fid metric on generated vs real
Retrieval : when doing a retrieval from a text in the generated images, does the matching generated image appear in the top 10 ? Compare with the same thing on the real images

I think as a first step having only the human metric (just implement the generation) would be cool. It would be a small variation of generate.py

May 29 '21 15:05 rom1504

+1! I started to work on this recently, have a look here if you are interested: https://github.com/mehdidc/DALLE_clip_score. I also made a tool for splitting the data into train/test, so that we can evaluate on unseen images/captions: https://github.com/mehdidc/DALLE_utils

I have a model trained on CUB_200_2011 as a starting point, will provide some test curves later.

Jun 11 '21 08:06 mehdidc

For FID, we can use https://github.com/GaParmar/clean-fid

Jun 14 '21 06:06 mehdidc

Great idea and I am looking for such functions. Are there any updates? Thanks

Dec 20 '21 05:12 shizhediao

DALLE-pytorch DALLE-pytorch copied to clipboard

Adding a eval.py

DALLE-pytorch
DALLE-pytorch copied to clipboard