Tensorflow_PersonLab icon indicating copy to clipboard operation
Tensorflow_PersonLab copied to clipboard

Some confusion about config.py

Open KaiChen1998 opened this issue 5 years ago • 8 comments

First of all, thank you so much for providing such great code! I just don't understand some parameters you use in your code, especially in your data augmentation part.

Specificly, what does the TransformationParams.target_dist do, which has been set to a constant equal 0.8. I mean you have done random scale already, why do you still want to add a constant factor here?

Again thank you for your great code! I have learned a lot of staff from it.

class TransformationParams:
    target_dist = 0.8
    scale_prob = 1.
    scale_min = 0.8
    scale_max = 2.0
    max_rotate_degree = 30.
    center_perterb_max = 20.0
    flip_prob = 0.5

KaiChen1998 avatar Dec 01 '19 02:12 KaiChen1998

It means that if you want to train in single scale rather than multi-scale, what scale of the input image you want to set.

scnuhealthy avatar Dec 04 '19 08:12 scnuhealthy

@scnuhealthy Thank you for your reply! But I'm sorry I don't quite get your idea. In your code, this target_dist changes the parameters in the scale affine transformation matrix so even when I close data augmentation, the affine function will still change the original image. :joy:

KaiChen1998 avatar Dec 05 '19 16:12 KaiChen1998

Thanks for your question. target_dist is to control the resolution of input when training in single scale.(Training in a larger resolution usually results in better result.) If you close data augmentation and want to train in the original resolution of the input, just set target_dist=1.0.

scnuhealthy avatar Dec 07 '19 14:12 scnuhealthy

Thank you for your reply. But just to make it clear (you know, students' habits :joy: )

  1. So you mean target_dist is just kind of a scale factor, rgiht? But when I try to visulize your result when I close the data augmentation, I find the original image has also been cropped, because when you multiple the affine transformation matrix, the target_dist parameter makes the final column of your result matrix unequal with 0. It's something that you want or there are some mistakes here?

  2. Which way do you think is better during inferencing? Keep the ratio of image width and height and then pad, like what you have done, or simply resize the image to [401, 401]?

KaiChen1998 avatar Dec 10 '19 02:12 KaiChen1998

@scnuhealthy BTW, is there any possibility we could talk on Wechat?

KaiChen1998 avatar Dec 10 '19 02:12 KaiChen1998

For two questions: 1 When visulizing, target_dist should be set 1.0. I forget to consider this parameter when coding demo.py.

2 Multi-scale testing will achieve better result. For example, if you train in scale S, test in 0.8S, 1.0S and 1.2*S, and then ensemble the three results, the performance will be better. (A common way for improving mAP in COCO dataset.)

scnuhealthy avatar Dec 10 '19 08:12 scnuhealthy

  1. Sorry I still don't get your point in setting this parameter. Why do you still need it after you do the data augmentation using affine matrix? (I think after using data augmentation it's not single scale training any more right?)

  2. Unfortunately it's not my point. I mean when you resize the original image to your network input size, there are two ways: keeping the ratio of height and width (长宽比) and then padding or resize the image directly without keeping that. Which one do you think is better?

KaiChen1998 avatar Dec 10 '19 11:12 KaiChen1998

Keeping the ratio throught padding is better during inference and I also use this strategy when training.

scnuhealthy avatar Dec 16 '19 12:12 scnuhealthy