cp-vton-plus icon indicating copy to clipboard operation
cp-vton-plus copied to clipboard

How to do quantitative evaluation?

Open dopu2k16 opened this issue 3 years ago • 24 comments

I wanted to know how did you calculate Warped Blended (IoU), SSIM, LPIPS, IS metrics for quantitative evaluation and where exactly did you implement in the code for quantitative evaluation?

dopu2k16 avatar Oct 02 '20 14:10 dopu2k16

Hey, did u get any way to measure LPIPS?. I learned that IS cannot be used as proper evaluation metric for this task . We could only do SSIM,IOU as per now.

Bouncer51 avatar Oct 23 '20 14:10 Bouncer51

quantitative evaluation codes are not included in this repository. We used the official codebase for these metrics. Please refer to the metrics' official codebases.

minar09 avatar Dec 11 '20 13:12 minar09

Hello! I encountered many problems with metrics in this field too and I can't reproduce the same score. Did you use skimage library for metrics? Can you provide the code so that I can check if I am using it correctly?

Gogo693 avatar Jan 19 '21 14:01 Gogo693

+1

Ha0Tang avatar Jan 28 '21 15:01 Ha0Tang

We tried to use the official source codes for the quantitative evaluation as much as possible. Here are the repositories used for evaluation as listed below:

  1. IoU: (Implemented in MatLab according to FCN paper)
  2. SSIM: https://www.mathworks.com/help/images/ref/ssim.html
  3. LPIPS (AlexNet): https://github.com/richzhang/PerceptualSimilarity
  4. IS: https://github.com/sundyCoder/IS_MS_SS/blob/master/is/IS.py

minar09 avatar Jan 30 '21 05:01 minar09

I will try those, thank you so much for the help!

Gogo693 avatar Feb 01 '21 11:02 Gogo693

Hi, @Gogo693 where is the target human image to calculate SSIM and LPIPS? and where is the real parsed segmentation map to calculate mIoU? Thanks.

Ha0Tang avatar Feb 01 '21 11:02 Ha0Tang

Hi, I only tried to calculate SSIM and I used the original person image in the test dataset as target human image, even if it has a different garment. I can't be sure it is the best solution but I don't know how to do it otherwise, maybe Minar can help us. For what concerns LPIPS and mIoU I don't know if I am going to use those (I did not read how they work yet), but I will update if I discover something.

Gogo693 avatar Feb 01 '21 11:02 Gogo693

I only tried to calculate SSIM and I used the original person image in the test dataset as target human image, even if it has a different garment. I can't be sure it is the best solution but I don't know how to do it otherwise

@Gogo693 I think this is not the correct way to calculate SSIM and LPIPS, @minar09 can you help us?

Ha0Tang avatar Feb 01 '21 12:02 Ha0Tang

Hi, except the inception score, for all of the metrics evaluation, you need to generate test results on the paired setting, which means input pairs with same clothes so that you can use the person image as the ground truth. Please see our CP-VTON+ or CloTH-VTON paper for these details. Different clothed pairs are for visualization comparison only.

minar09 avatar Feb 01 '21 12:02 minar09

@minar09 thanks, where can I find the details for the paired setting?

Ha0Tang avatar Feb 01 '21 12:02 Ha0Tang

This paired setting means the same setting as training, same id for both cloth and person of a pair input.

minar09 avatar Feb 01 '21 13:02 minar09

So it is paired for SSIM (or when we need comparison) and unpaired for IS as in shown results. Thank you, I did not understand this was the standard for evaluating try-on but now it is clear!

Gogo693 avatar Feb 01 '21 15:02 Gogo693

I know the difference between the two, but still don't know how to evaluate it, if someone can provide more instructions, that would be great.

Ha0Tang avatar Feb 02 '21 13:02 Ha0Tang

This is what I understood. When running test you run 2 experiments:

  1. You use as input Person P_i matched with Cloth C_i (paired input like in training) to generate G_i_i, then you evaluate SSIM comparing G_i_i with the original image P_i.
  2. You use as input Person P_i matched with Cloth C_j (unpaired match i != j) to generate G_i_j, then you evaluate IS on the generated images without comparison as it is not needed by the IS score. For the unpaired matches there should be a text file in the dataset.

I don't know if this was your doubt, but I hope it is useful.

Gogo693 avatar Feb 02 '21 16:02 Gogo693

Hi, @minar09 @Gogo693, Sorry for bothering you with this overtasked problem. About the evaluation for SSIM and LPIPS, am I right for the following understanding? Firstly: Step1: Reset the test_Pairs.txt file from [000001_0.jpg 001744_1.jpg 000010_0.jpg 004325_1.jpg ...] to [000001_0.jpg 000001_1.jpg 000010_0.jpg 000010_1.jpg ...] Step2: Run testGMM to get warped clothing

Step3: Run testTOM to get the try-on results that the input Person(Now, we get try-on result.)

Step4: Compute SSIM and LPIPS between original person image and new try-on result.(Here we see the original person image as ground truth)

What's more, about the calculation of IoU, I am also a little confused. Within the paper of cp-vton-plus, there is only one sentence "the parsed segmentation area for the current upper clothing is used as the IoU reference". I guess maybe I should calculate IoU between parsed segmentation clothing area( warped clothing ) and original target clothing?

It would be great thanks if you can shed some light on those problems, and also it would be better if you could provide the overall evaluation tools and instruction documents.

Amazingren avatar Feb 04 '21 12:02 Amazingren

I think that is correct for SSIM and LPIPS. I tried this setting for SSIM in another work and I could replicate the results. For IoU I can't help you yet as I'm not using it, but I will update in case I try it.

Gogo693 avatar Feb 05 '21 11:02 Gogo693

I think that is correct for SSIM and LPIPS. I tried this setting for SSIM in another work and I could replicate the results. For IoU I can't help you yet as I'm not using it, but I will update in case I try it.

Hi @Gogo693, follow this way now I can nearly replicate the results in the paper. Great thanks for the useful feedback!

Amazingren avatar Feb 06 '21 15:02 Amazingren

Hi, thank you everyone for the great discussion. I made a minor update to the repo to help you with the evaluation.

test_pairs_same.txt is added to the data folder. So, for the quantitative evaluation, you can simply uncomment the line https://github.com/minar09/cp-vton-plus/blob/master/test.py#L36 and comment the previous one.

For IoU evaluation, warped clothing mask/silhouette and the target-clothing-mask-on-person from the ground truth image (segmentation) are used, then you can calculate the metric by doing intersection and union between them.

Hope these help. Thank you.

minar09 avatar Feb 06 '21 23:02 minar09

Thank you Minar for being very clear and available. If I may ask the last confirm on one subject to be sure: can you confirm IS score is computed on the 'unpaired' generated images?

Gogo693 avatar Feb 15 '21 11:02 Gogo693

Yes, in CP-VTON+, IS is evaluated on unpaired test cases, meaning target cloth is different from the source human outfit. Evaluation was run on the test_pairs.txt list.

minar09 avatar Feb 15 '21 17:02 minar09

@minar09 can you share your IoU evaluation source code?

Ha0Tang avatar Feb 18 '21 08:02 Ha0Tang

Hi @minar09 , since IoU metric with matlab code is hard to understand for me in here, is that okay for me to use jaccard_similarity_score replacing the IoU metric in your paper for evaluating the performance of GMM?

Amazingren avatar Mar 06 '21 12:03 Amazingren

@Ha0Tang , sorry for my late reply, it's hard for me to find time to maintain the repositories nowadays, so I am adding a snapshot of my IoU evaluation code here. Hope this helps. Thank you. image

@Amazingren , sure Jaccard index should work as well. Sorry for my late reply. Thank you for your understanding. Have a nice day.

minar09 avatar Mar 28 '21 07:03 minar09