PMTD About configurations

First, thank you for your kind paper and github page. Your work is super useful for studying text detection using mask-rcnn baseline. I am reproducing the results of PMTD but my results are little bit worse. (Mask RCNN baseline 60% F-measure on MLT dataset) So I'm figuring out what is wrong with my configuration. It will be very helpful if the config file (.yaml) is provided, or let me know RPN.ANCHOR_STRIDE setting (currently, I'm using (4, 8, 16, 32, 64)) Thanks!

Jul 25 '19 09:07 hellbell

I think you may meet the same question as I met before.You can have a look at my issue.The author gives some useful advice.

Aug 05 '19 13:08 kapness

@kapness Thank you for the kind reply! I followed your issue but the results were still worse than my expectation. It would be very helpful if you share your config file (.yaml) :) Thank you again.

Aug 06 '19 00:08 hellbell

if you complete data aug correctly in transform.py ，the F-score can reach 72% without other changes.I do not change the original yaml file.

---Original--- From: "Sangdoo Yun"[email protected] Date: Tue, Aug 6, 2019 08:45 AM To: "STVIR/PMTD"[email protected]; Cc: "kapness"[email protected];"Mention"[email protected]; Subject: Re: [STVIR/PMTD] About configurations (#8)

@kapness Thank you for the kind reply! I followed your issue but the results were still worse than my expectation. It would be very helpful if you share your config file (.yaml) :) Thank you again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Aug 06 '19 01:08 kapness

my batchsize is 16 and LR starts with 0.01

---Original--- From: "Sangdoo Yun"[email protected] Date: Tue, Aug 6, 2019 08:45 AM To: "STVIR/PMTD"[email protected]; Cc: "kapness"[email protected];"Mention"[email protected]; Subject: Re: [STVIR/PMTD] About configurations (#8)

@kapness Thank you for the kind reply! I followed your issue but the results were still worse than my expectation. It would be very helpful if you share your config file (.yaml) :) Thank you again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Aug 06 '19 01:08 kapness

@kapness Thank you for your advice. I will try it right now!

Aug 06 '19 15:08 hellbell

@kapness Thanks a lot!

Aug 07 '19 05:08 JingChaoLiu

@hellbell And the _C.MODEL.RPN.ASPECT_RATIOS in defaults.py should be modified as the paper said. I forgot this tip before.

Aug 07 '19 06:08 kapness

@kapness @JingChaoLiu Thank you for your kind replies. I trained vanilla Mask-RCNN on ICDAR2017-MLT and got F-score only 62% which is still far under the baseline. My settings:

based on e2e_mask_rcnn_R_50_FPN_1x.yaml
changed MODEL.RPN.ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)
changed MODEL.RPN.FPN_POST_NMS_PER_BATCH = False
4 gpus with these learning rates

SOLVER:
  BASE_LR: 0.01
  WEIGHT_DECAY: 0.0001
  STEPS: (50000, 80000)
  MAX_ITER: 100000
  IMS_PER_BATCH: 16

My questions are:

At the test time, the confidence score threshold for selecting valid bounding box is set to 0.5. Is it okay?
I guess my data augmentation of trasnform.py might be wrong. Would you share your transform.py file or give me some tips? I posted my code snippets.

class RandomSampleCrop(object):
    def __init__(self, crop_size=640, min_size=640, max_size=2560):
        self.crop_size = crop_size
        self.min_size = min_size
        self.max_size = max_size


    def get_size(self):
        # w, h = image_size
        w_resize = random.randint(self.min_size, self.max_size)
        h_resize = random.randint(self.min_size, self.max_size)
        return (h_resize, w_resize)

    def __call__(self, image, target):

        while (True):
            resized_size = self.get_size()
            image_r = F.resize(image, resized_size)
            target_r = target.resize(image_r.size)

            width, height = image_r.size
            crop_left = random.randint(0,width-self.crop_size)
            crop_top = random.randint(0,height-self.crop_size)
            target_r_c = target_r.crop([crop_left, crop_top, crop_left+self.crop_size, crop_top+self.crop_size])
            target_r_c = target_r_c.clip_to_image()
            if len(target_r_c) > 0:

                too_small = False
                for t in target_r_c.bbox:
                    w, h = t[2] - t[0], t[3] - t[1]
                    if w < 1 or h < 1:
                        too_small = True
                if too_small:
                    continue
                break
        target_r_c = target_r_c
        image_r_c = image_r.crop([crop_left, crop_top, crop_left + self.crop_size, crop_top + self.crop_size])

Many thanks!

Aug 15 '19 07:08 hellbell

if you use original crop function implemented by maskrcnn ,maybe you are wrong.I think it doesn't crop mask gt properly.You can see its source code in modeling/structure.

---Original--- From: "Sangdoo Yun"[email protected] Date: Thu, Aug 15, 2019 15:04 PM To: "STVIR/PMTD"[email protected]; Cc: "kapness"[email protected];"Mention"[email protected]; Subject: Re: [STVIR/PMTD] About configurations (#8)

@kapness @JingChaoLiu Thank you for your kind replies. I trained vanilla Mask-RCNN on ICDAR2017-MLT and got F-score only 62% which is still far under the baseline. My settings:

based on e2e_mask_rcnn_R_50_FPN_1x.yaml

changed MODEL.RPN.ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)

changed MODEL.RPN.FPN_POST_NMS_PER_BATCH = False

4 gpus with these learning rates SOLVER: BASE_LR: 0.01 WEIGHT_DECAY: 0.0001 STEPS: (50000, 80000) MAX_ITER: 100000 IMS_PER_BATCH: 16
My questions are:

At the test time, the confidence score threshold for selecting valid bounding box is set to 0.5. Is it okay?

I guess my data augmentation of trasnform.py might be wrong. Would you share your transform.py file or give me some tips? I posted my code snippets. class RandomSampleCrop(object): def init(self, crop_size=640, min_size=640, max_size=2560): self.crop_size = crop_size self.min_size = min_size self.max_size = max_size def get_size(self): # w, h = image_size w_resize = random.randint(self.min_size, self.max_size) h_resize = random.randint(self.min_size, self.max_size) return (h_resize, w_resize) def call(self, image, target): while (True): resized_size = self.get_size() image_r = F.resize(image, resized_size) target_r = target.resize(image_r.size) width, height = image_r.size crop_left = random.randint(0,width-self.crop_size) crop_top = random.randint(0,height-self.crop_size) target_r_c = target_r.crop([crop_left, crop_top, crop_left+self.crop_size, crop_top+self.crop_size]) target_r_c = target_r_c.clip_to_image() if len(target_r_c) > 0: too_small = False for t in target_r_c.bbox: w, h = t[2] - t[0], t[3] - t[1] if w < 1 or h < 1: too_small = True if too_small: continue break target_r_c = target_r_c image_r_c = image_r.crop([crop_left, crop_top, crop_left + self.crop_size, crop_top + self.crop_size])
Many thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Aug 15 '19 07:08 kapness

@kapness I checked the crop function with some visualization. It seems ok..

Aug 15 '19 07:08 hellbell

@kapness thanks again for your reply. @hellbell

Following the previous answers and the paper, here is one configuration which I just wrote. Sorry for no time to validate it and no guarantee to the F-measure.

MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    ANCHOR_SIZES: (16, 32, 64, 128, 256)
    ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)
    STRADDLE_THRESH: 10 # Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels,
      # I changed this value from 0 to 10 in the early stage accidentally and forgot to change back. But I think this change makes no difference.
    PRE_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_PER_BATCH: False
  ROI_HEADS:
    USE_FPN: True
  ROI_BOX_HEAD:
    NUM_CLASSES: 2
    POOLER_RESOLUTION: 7
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
  ROI_MASK_HEAD:
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
    PREDICTOR: "MaskRCNNC4Predictor"
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 2
    RESOLUTION: 28
    SHARE_BOX_FEATURE_EXTRACTOR: False
  MASK_ON: True
DATASETS:
  TRAIN: ("icadar_2017_mlt_train", "icdar_2017_mlt_val")
  TEST: ("icdar_2017_mlt_test",)
DATALOADER:
  SIZE_DIVISIBILITY: 32
SOLVER:
  WARMUP_METHOD: 'linear' # PMTD use 'exponential' which is not implemented in maskrcnn-benchmark
  WARMUP_ITERS: 4500 # warmup_iter = (image_num=9000 * warmup_epoch=8 / batch_size=16)
  IMS_PER_BATCH: 16
  BASE_LR: 0.02 # PMTD use batch_size * 0.00125 with syncBN
  WEIGHT_DECAY: 0.0001
  STEPS: (49500, 76500) # warmup_iter + (iter * 0.5, iter * 0.8)
  MAX_ITER: 94500 # iter = (image_num=9000 * warmup_epoch=160 / batch_size=16) = 90000, max_iter = (warmup_iter + iter)

Have you done a grid search for the parameters (cls_threshold, nms_threshold) of final NMS? See #4 for more details. This can make a bigger difference than some neglectable training details.
See #5 to see the problematic crop operation. There are two problems. One, the point number of the cropped mask may float from 3 to 8, no longer a constant number 4. Two, the difference between the cropped origin bounding box and the correct cropped bounding box.

Aug 15 '19 09:08 JingChaoLiu

now I have one question about the OHEM.In paper,you compute 512 proposals for OHEM in roi_heads,is it right？(or should I modify it in RPN branch?) But my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM? In maskrcnn ,roi_head branch gets 512 proposals/image. thanks for your kind reply again.I think this is my last question about the baseline...

---Original--- From: "JingChaoLiu"[email protected] Date: Thu, Aug 15, 2019 17:17 PM To: "STVIR/PMTD"[email protected]; Cc: "kapness"[email protected];"Mention"[email protected]; Subject: Re: [STVIR/PMTD] About configurations (#8)

@kapness thanks again for your reply. @hellbell

Following the previous answers and the paper, here is one configuration which I just wrote. Sorry for no time to validate it and no guarantee to the F-measure. MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50" BACKBONE: CONV_BODY: "R-50-FPN" RESNETS: BACKBONE_OUT_CHANNELS: 256 RPN: USE_FPN: True ANCHOR_STRIDE: (4, 8, 16, 32, 64) ANCHOR_SIZES: (16, 32, 64, 128, 256) ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46) STRADDLE_THRESH: 10 # Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels, # I changed this value from 0 to 10 in the early stage accidentally and forgot to change back. But I think this change makes no difference. PRE_NMS_TOP_N_TRAIN: 2000 PRE_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_PER_BATCH: False ROI_HEADS: USE_FPN: True ROI_BOX_HEAD: NUM_CLASSES: 2 POOLER_RESOLUTION: 7 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) POOLER_SAMPLING_RATIO: 2 FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor" PREDICTOR: "FPNPredictor" ROI_MASK_HEAD: POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor" PREDICTOR: "MaskRCNNC4Predictor" POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 RESOLUTION: 28 SHARE_BOX_FEATURE_EXTRACTOR: False MASK_ON: True DATASETS: TRAIN: ("icadar_2017_mlt_train", "icdar_2017_mlt_val") TEST: ("icdar_2017_mlt_test",) DATALOADER: SIZE_DIVISIBILITY: 32 SOLVER: WARMUP_METHOD: 'linear' # PMTD use 'exponential' which is not implemented in maskrcnn-benchmark WARMUP_ITERS: 4500 # warmup_iter = (image_num=9000 * warmup_epoch=8 / batch_size=16) IMS_PER_BATCH: 16 BASE_LR: 0.02 # PMTD use batch_size * 0.00125 with syncBN WEIGHT_DECAY: 0.0001 STEPS: (49500, 76500) # warmup_iter + (iter * 0.5, iter * 0.8) MAX_ITER: 94500 # iter = (image_num=9000 * warmup_epoch=160 / batch_size=16) = 90000, max_iter = (warmup_iter + iter)

Have you done a grid search for the parameters (cls_threshold, nms_threshold) of final NMS? See #4 for more details. This can make a bigger difference than some neglectable training details.

See #5 to see the problematic crop operation. There are two problems. One, the point number of the cropped mask may float from 3 to 8, no longer a constant number 4. Two, the difference between the cropped origin bounding box and the correct cropped bounding box.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Aug 15 '19 10:08 kapness

OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.

image -> backbone
-> RPN >> pred_cls, pred_reg = RPN.forward(All proposals) >> random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 * image_num proposals to calculate loss. (sample_num is far less than len(All proposals)) >> postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN * image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False
RPN -> bbox branch >> pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN) >> random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals to calculate loss >> (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.
RPN -> mask branch >> pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN) >> calculate mask loss for all predicted mask
backward the loss to update parameters

my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?

batch_size = 16 is enough.

Aug 15 '19 13:08 JingChaoLiu

thx very very much for saving me!

---Original--- From: "JingChaoLiu"[email protected] Date: Thu, Aug 15, 2019 21:24 PM To: "STVIR/PMTD"[email protected]; Cc: "kapness"[email protected];"Mention"[email protected]; Subject: Re: [STVIR/PMTD] About configurations (#8)

OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.

image -> backbone

-> RPN

pred_cls, pred_reg = RPN.forward(All proposals) random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 * image_num proposals to calculate loss. (sample_num is far less than len(All proposals)) postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN * image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False

RPN -> bbox branch

pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN) random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals to calculate loss (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.

RPN -> mask branch

pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN) calculate mask loss for all predicted mask

backward the loss to update parameters

my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?

batch_size = 16 is enough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Aug 15 '19 13:08 kapness

I'm so sorry to disturb you again..

(add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.

here,in original code,it only computes the reg_loss of positive proposals.should I first set the reg loss of negative proposals,then add cls_loss to reg_loss and sort?

---Original--- From: "JingChaoLiu"[email protected] Date: Thu, Aug 15, 2019 21:24 PM To: "STVIR/PMTD"[email protected]; Cc: "kapness"[email protected];"Mention"[email protected]; Subject: Re: [STVIR/PMTD] About configurations (#8)

OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.

image -> backbone

-> RPN

pred_cls, pred_reg = RPN.forward(All proposals) random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 * image_num proposals to calculate loss. (sample_num is far less than len(All proposals)) postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN * image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False

RPN -> bbox branch

pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN) random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals to calculate loss (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.

RPN -> mask branch

pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN) calculate mask loss for all predicted mask

backward the loss to update parameters

my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?

batch_size = 16 is enough.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Aug 16 '19 07:08 kapness

yes, for the negative proposals, just set the reg loss to 0 before sorting

Aug 16 '19 07:08 JingChaoLiu

PMTD PMTD copied to clipboard

About configurations

PMTD
PMTD copied to clipboard