Global-Flow-Local-Attention FID score is great, but SSIM score I got is lower than previous work

FID score is great, but SSIM score I got is lower than previous work

Open PangzeCheung opened this issue 4 years ago • 5 comments

Hello, I have run your code. Good results have been achieved. But in the fashion dataset(176*256), only SSIM 0.65 was obtained and SSIM_256 is 0.68. Is there any problem with the SSIM I got? Because SSIM was reached 0.773 in 'Progressive Pose Attention Transfer for Person Image Generation'. I am very confused about this problem～ Thank you very much.

Jun 01 '20 15:06 PangzeCheung

Hi, we did not provide SSIM scores in our paper. One reason is that metrics such as SSIM and PSNR cannot accurately evaluate the results of this task. Some explanations and experiments can be found in paper.

Jun 02 '20 11:06 RenYurui

@PangzeCheung Did you test upon the cropped 176*256 images directly? In PATN there is a cunning trick called "re-padding", as shown here:

def addBounding(image, bound=40):
    h, w, c = image.shape
    image_bound = np.ones((h, w+bound*2, c))*255
    image_bound = image_bound.astype(np.uint8)
    image_bound[:, bound:bound+w] = image

    return image_bound

As you can see, during evaluation the cropped images are re-padded back to 256*256 with all white (255) pixels. Since the SSIM for padded area is 1, this trick can raise the SSIM scores by around 0.10. By the way, PATN without padding also only reaches 0.68.

Incidentally, @RenYurui's claim on the unreliability of SSIM is categorically true. I found more evidence by carrying a two-stage pose transfer, where both the SSIM and Inception score drop after stage II enhancement, but the perceptual quality is dramatically improved. You can find more details in this paper.

Jun 04 '20 08:06 Lotayou

@Lotayou Yes, I test upon the cropped 176*256 images directly. I didn't notice the test in PATN's paper. Thank you for your kind answer!

Jun 04 '20 13:06 PangzeCheung

@Lotayou great work! So, in your paper, the SSIM and FID score are all calculated upon the re-padding images or FID is calculated upon the cropped 176*256 images directly? Thank you very much!

Jun 09 '20 16:06 PangzeCheung

@Lotayou great work! So, in your paper, the SSIM and FID score are all calculated upon the re-padding images or FID is calculated upon the cropped 176*256 images directly? Thank you very much!

Yep, we stick to the PATN evaluation scheme for a fair comparison (fair for our method, that is). I guess this trick only affects statistical measures such as PSNR and SSIM, while FID and LPIPS are evaluated upon deep semantic features so the padding shouldn't be causing too much difference.

Jun 10 '20 09:06 Lotayou

Global-Flow-Local-Attention Global-Flow-Local-Attention copied to clipboard

FID score is great, but SSIM score I got is lower than previous work

Global-Flow-Local-Attention
Global-Flow-Local-Attention copied to clipboard