GP-UNIT
GP-UNIT copied to clipboard
How to calculate the FID and LPIPS
Dear authors: I want to ask about the evaluation FID and LPIPS. When we calculate the FID and LPIPS, what dataset do we use? and under what circumstances(such as some details as how many pictures do we use? Using training dataset or testing dataset? What are the sources of real_images and fake_images, e.g. cat2dog)? Can you tell me more details about evaluation? Looking forward to you early reply!
We use and modify this code for evaluation https://github.com/clovaai/stargan-v2/blob/875b70a150609e8a678ed8482562e7074cdce7e5/metrics/eval.py
Fake images are generated from the testing set of the dataset. For cat2dog, there are 500 testing images (https://github.com/clovaai/stargan-v2/blob/master/README.md#animal-faces-hq-dataset-afhq).
For FID, the real images are the training images in the dataset.
Thank you for your answer