Global-Flow-Local-Attention
Global-Flow-Local-Attention copied to clipboard
FID score is great, but SSIM score I got is lower than previous work
Hello, I have run your code. Good results have been achieved. But in the fashion dataset(176*256), only SSIM 0.65 was obtained and SSIM_256 is 0.68. Is there any problem with the SSIM I got? Because SSIM was reached 0.773 in 'Progressive Pose Attention Transfer for Person Image Generation'. I am very confused about this problem~ Thank you very much.
Hi, we did not provide SSIM scores in our paper. One reason is that metrics such as SSIM and PSNR cannot accurately evaluate the results of this task. Some explanations and experiments can be found in paper.
@PangzeCheung Did you test upon the cropped 176*256 images directly? In PATN there is a cunning trick called "re-padding", as shown here:
def addBounding(image, bound=40):
h, w, c = image.shape
image_bound = np.ones((h, w+bound*2, c))*255
image_bound = image_bound.astype(np.uint8)
image_bound[:, bound:bound+w] = image
return image_bound
As you can see, during evaluation the cropped images are re-padded back to 256*256 with all white (255) pixels. Since the SSIM for padded area is 1, this trick can raise the SSIM scores by around 0.10. By the way, PATN without padding also only reaches 0.68.
Incidentally, @RenYurui's claim on the unreliability of SSIM is categorically true. I found more evidence by carrying a two-stage pose transfer, where both the SSIM and Inception score drop after stage II enhancement, but the perceptual quality is dramatically improved. You can find more details in this paper.
@Lotayou Yes, I test upon the cropped 176*256 images directly. I didn't notice the test in PATN's paper. Thank you for your kind answer!
@Lotayou great work! So, in your paper, the SSIM and FID score are all calculated upon the re-padding images or FID is calculated upon the cropped 176*256 images directly? Thank you very much!
@Lotayou great work! So, in your paper, the SSIM and FID score are all calculated upon the re-padding images or FID is calculated upon the cropped 176*256 images directly? Thank you very much!
Yep, we stick to the PATN evaluation scheme for a fair comparison (fair for our method, that is). I guess this trick only affects statistical measures such as PSNR and SSIM, while FID and LPIPS are evaluated upon deep semantic features so the padding shouldn't be causing too much difference.