DALLE-pytorch
DALLE-pytorch copied to clipboard
Adding a eval.py
We're starting to have a few trained dalle models. People are looking at loss and a few examples as evaluation currently. I think we can make evaluation easier and better by coding a eval.py.
It would take as input a evaluation directory in the same format as the training then generate pictures for all the text then compute some metrics. Metrics example:
- Human: just look at the output folder and see if it looks good
- Clip: compute clip dot product between text and generated image, also compute between text and real image, compute the ratio, do the average on the whole eval set
- Fid : compute the fid metric on generated vs real
- Retrieval : when doing a retrieval from a text in the generated images, does the matching generated image appear in the top 10 ? Compare with the same thing on the real images
I think as a first step having only the human metric (just implement the generation) would be cool. It would be a small variation of generate.py
+1! I started to work on this recently, have a look here if you are interested: https://github.com/mehdidc/DALLE_clip_score. I also made a tool for splitting the data into train/test, so that we can evaluate on unseen images/captions: https://github.com/mehdidc/DALLE_utils
I have a model trained on CUB_200_2011 as a starting point, will provide some test curves later.
For FID, we can use https://github.com/GaParmar/clean-fid
Great idea and I am looking for such functions. Are there any updates? Thanks