MaskDINO large gap for reproducing the semantic segmentation results

Hi, thanks for the excellent work. I'm trying to reproduce the semantic segmentation results (ResNet-50 backbone + ADE20K). However, the performance is 46.6%, which is much lower than yours by 2.1%. I've conducted the experiments three times, all the performance were around 46.6%.

The config file that I used is configs/ade20k/semantic-segmentation/maskdino_R50_bs16_160k_steplr.yaml, which indicates the training iteration is 160K, same as mentioned in the paper. However, the model we can download here is maskdino_r50_50ep_100q_celoss_hid1024_3s_semantic_ade20k_48.7miou.pth, which indicates the training epoch is 50. It seems the training config files are different between them. Therefore, I'm wondering if there was something I missed when training this model?

Aug 01 '23 20:08 ZhengyuXia

@ZhengyuXia I am

[08/02 08:49:23 d2.engine.defaults]: Evaluation results for ade20k_sem_seg_val in csv format:
[08/02 08:49:23 d2.evaluation.testing]: copypaste: Task: sem_seg
[08/02 08:49:23 d2.evaluation.testing]: copypaste: mIoU,fwIoU,mACC,pACC
[08/02 08:49:23 d2.evaluation.testing]: copypaste: 45.5368,70.6117,59.3918,81.6061

I have run it three times, and the results are all similar.

Aug 02 '23 03:08 hhaAndroid

@hhaAndroid

I rollback the python version from 3.8 to 3.7, and the performance increased by ~0.4% mIoU. I also enabled the "SyncBN" in the config file and it gives additional ~0.5% mIoU improvement. So far, my best reproduction result is 47.6%, but it is still lower than the paper's result by ~1%.

Aug 03 '23 19:08 ZhengyuXia

ade_48.7log.txt Hi, above is my log file for the 48.7 results for your reference. ADE20K is a small dataset, and the performance may not be so stable. I will also check the code to see if something gets wrong.

Aug 04 '23 19:08 FengLi-ust

@FengLi-ust

Thanks for uploading the log file. I roughly checked the settings in this file, and found several difference.

NORM in ResNet,

The NORM setting in the log file is FrozenBN

  RESNETS:
    DEFORM_MODULATED: False
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE: [False, False, False, False]
    DEPTH: 50
    NORM: FrozenBN

But it is disabled in the given yaml file

  RESNETS:
    DEPTH: 50
    STEM_TYPE: "basic"  # not used
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: False
    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
    # NORM: "SyncBN"

The CLASS_WEIGHT and DEC layer are different

The CLASS_WEIGHT is 2.0 and DEC layer is 10 in the log file

  MASK_FORMER:
    BOX_LOSS: True
    BOX_WEIGHT: 5.0
    CLASS_WEIGHT: 2.0
    DEC_LAYERS: 10

But the CLASS_WEIGHT is 4.0 and DEC layer is 9 in the yaml file

  MaskDINO:
    TRANSFORMER_DECODER_NAME: "MaskDINODecoder"
    DEEP_SUPERVISION: True
    NO_OBJECT_WEIGHT: 0.1
    CLASS_WEIGHT: 4.0
    MASK_WEIGHT: 5.0
    DICE_WEIGHT: 5.0
    HIDDEN_DIM: 256
    NUM_OBJECT_QUERIES: 100
    NHEADS: 8
    DROPOUT: 0.0
    DIM_FEEDFORWARD: 2048
    ENC_LAYERS: 0
    PRE_NORM: False
    ENFORCE_INPUT_PROJ: False
    SIZE_DIVISIBILITY: 32
    DEC_LAYERS: 9  # 9 decoder layers, add one for the loss on learnable query

I tried to use all of or some of these settings, but the best performance is ~47.1% mIoU, still lower by ~1.6%

Aug 14 '23 15:08 ZhengyuXia

I found my training loss is significantly larger than that in your log file，is it right?

Feb 21 '25 19:02 Linwei-Chen

MaskDINO MaskDINO copied to clipboard

large gap for reproducing the semantic segmentation results

MaskDINO
MaskDINO copied to clipboard