mmsegmentation Weird validation results

Hi, I'm having a weird issue with training a segmentation network. I'm using a custom pre-trained backbone (I can not disclose more information as it is part of a research paper we're working on) together with the UperNet segmentation head for ADE20K 160k iteration segmentation (based on existing configs for Convnext and Swin) with single-scale validation.

During training, everything ran fine. The losses (loss, decode.loss_ce, aux.loss_ce) are decreasing, and the accuracy (decode.acc_seg, aux.acc_sec) increases up to the area of around 70-90% after around 16k iterations. However, during validation, the results (again, at the 16k iterations mark) are as follows:

+---------------------+-------+-------+
|        Class        |  IoU  |  Acc  |
+---------------------+-------+-------+
|         wall        | 17.54 | 100.0 |
|       building      |  0.0  |  0.0  |
|         sky         |  0.0  |  0.0  |
|        floor        |  0.0  |  0.0  |
|         tree        |  0.0  |  0.0  |
|       ceiling       |  0.0  |  0.0  |
|         road        |  0.0  |  0.0  |
|         bed         |  0.0  |  0.0  |
|      windowpane     |  0.0  |  0.0  |
|        grass        |  0.0  |  0.0  |
|       cabinet       |  0.0  |  0.0  |
|       sidewalk      |  0.0  |  0.0  |
|        person       |  0.0  |  0.0  |
|        earth        |  0.0  |  0.0  |
|         door        |  0.0  |  0.0  |
|        table        |  0.0  |  0.0  |
|       mountain      |  0.0  |  0.0  |
|        plant        |  0.0  |  0.0  |
|       curtain       |  0.0  |  0.0  |
|        chair        |  0.0  |  0.0  |
|         car         |  0.0  |  0.0  |
|        water        |  0.0  |  0.0  |
|       painting      |  0.0  |  0.0  |
|         sofa        |  0.0  |  0.0  |
|        shelf        |  0.0  |  0.0  |
|        house        |  0.0  |  0.0  |
|         sea         |  0.0  |  0.0  |
|        mirror       |  0.0  |  0.0  |
|         rug         |  0.0  |  0.0  |
|        field        |  0.0  |  0.0  |
|       armchair      |  0.0  |  0.0  |
|         seat        |  0.0  |  0.0  |
|        fence        |  0.0  |  0.0  |
|         desk        |  0.0  |  0.0  |
|         rock        |  0.0  |  0.0  |
|       wardrobe      |  0.0  |  0.0  |
|         lamp        |  0.0  |  0.0  |
|       bathtub       |  0.0  |  0.0  |
|       railing       |  0.0  |  0.0  |
|       cushion       |  0.0  |  0.0  |
|         base        |  0.0  |  0.0  |
|         box         |  0.0  |  0.0  |
|        column       |  0.0  |  0.0  |
|      signboard      |  0.0  |  0.0  |
|   chest of drawers  |  0.0  |  0.0  |
|       counter       |  0.0  |  0.0  |
|         sand        |  0.0  |  0.0  |
|         sink        |  0.0  |  0.0  |
|      skyscraper     |  0.0  |  0.0  |
|      fireplace      |  0.0  |  0.0  |
|     refrigerator    |  0.0  |  0.0  |
|      grandstand     |  0.0  |  0.0  |
|         path        |  0.0  |  0.0  |
|        stairs       |  0.0  |  0.0  |
|        runway       |  0.0  |  0.0  |
|         case        |  0.0  |  0.0  |
|      pool table     |  0.0  |  0.0  |
|        pillow       |  0.0  |  0.0  |
|     screen door     |  0.0  |  0.0  |
|       stairway      |  0.0  |  0.0  |
|        river        |  0.0  |  0.0  |
|        bridge       |  0.0  |  0.0  |
|       bookcase      |  0.0  |  0.0  |
|        blind        |  0.0  |  0.0  |
|     coffee table    |  0.0  |  0.0  |
|        toilet       |  0.0  |  0.0  |
|        flower       |  0.0  |  0.0  |
|         book        |  0.0  |  0.0  |
|         hill        |  0.0  |  0.0  |
|        bench        |  0.0  |  0.0  |
|      countertop     |  0.0  |  0.0  |
|        stove        |  0.0  |  0.0  |
|         palm        |  0.0  |  0.0  |
|    kitchen island   |  0.0  |  0.0  |
|       computer      |  0.0  |  0.0  |
|     swivel chair    |  0.0  |  0.0  |
|         boat        |  0.0  |  0.0  |
|         bar         |  0.0  |  0.0  |
|    arcade machine   |  0.0  |  0.0  |
|        hovel        |  0.0  |  0.0  |
|         bus         |  0.0  |  0.0  |
|        towel        |  0.0  |  0.0  |
|        light        |  0.0  |  0.0  |
|        truck        |  0.0  |  0.0  |
|        tower        |  0.0  |  0.0  |
|      chandelier     |  0.0  |  0.0  |
|        awning       |  0.0  |  0.0  |
|     streetlight     |  0.0  |  0.0  |
|        booth        |  0.0  |  0.0  |
| television receiver |  0.0  |  0.0  |
|       airplane      |  0.0  |  0.0  |
|      dirt track     |  0.0  |  0.0  |
|       apparel       |  0.0  |  0.0  |
|         pole        |  0.0  |  0.0  |
|         land        |  0.0  |  0.0  |
|      bannister      |  0.0  |  0.0  |
|      escalator      |  0.0  |  0.0  |
|       ottoman       |  0.0  |  0.0  |
|        bottle       |  0.0  |  0.0  |
|        buffet       |  0.0  |  0.0  |
|        poster       |  0.0  |  0.0  |
|        stage        |  0.0  |  0.0  |
|         van         |  0.0  |  0.0  |
|         ship        |  0.0  |  0.0  |
|       fountain      |  0.0  |  0.0  |
|    conveyer belt    |  0.0  |  0.0  |
|        canopy       |  0.0  |  0.0  |
|        washer       |  0.0  |  0.0  |
|      plaything      |  0.0  |  0.0  |
|    swimming pool    |  0.0  |  0.0  |
|        stool        |  0.0  |  0.0  |
|        barrel       |  0.0  |  0.0  |
|        basket       |  0.0  |  0.0  |
|      waterfall      |  0.0  |  0.0  |
|         tent        |  0.0  |  0.0  |
|         bag         |  0.0  |  0.0  |
|       minibike      |  0.0  |  0.0  |
|        cradle       |  0.0  |  0.0  |
|         oven        |  0.0  |  0.0  |
|         ball        |  0.0  |  0.0  |
|         food        |  0.0  |  0.0  |
|         step        |  0.0  |  0.0  |
|         tank        |  0.0  |  0.0  |
|      trade name     |  0.0  |  0.0  |
|      microwave      |  0.0  |  0.0  |
|         pot         |  0.0  |  0.0  |
|        animal       |  0.0  |  0.0  |
|       bicycle       |  0.0  |  0.0  |
|         lake        |  0.0  |  0.0  |
|      dishwasher     |  0.0  |  0.0  |
|        screen       |  0.0  |  0.0  |
|       blanket       |  0.0  |  0.0  |
|      sculpture      |  0.0  |  0.0  |
|         hood        |  0.0  |  0.0  |
|        sconce       |  0.0  |  0.0  |
|         vase        |  0.0  |  0.0  |
|    traffic light    |  0.0  |  0.0  |
|         tray        |  0.0  |  0.0  |
|        ashcan       |  0.0  |  0.0  |
|         fan         |  0.0  |  0.0  |
|         pier        |  0.0  |  0.0  |
|      crt screen     |  0.0  |  0.0  |
|        plate        |  0.0  |  0.0  |
|       monitor       |  0.0  |  0.0  |
|    bulletin board   |  0.0  |  0.0  |
|        shower       |  0.0  |  0.0  |
|       radiator      |  0.0  |  0.0  |
|        glass        |  0.0  |  0.0  |
|        clock        |  0.0  |  0.0  |
|         flag        |  0.0  |  0.0  |
+---------------------+-------+-------+
06/15 20:19:41 - mmengine - INFO - Iter(val) [500/500]    aAcc: 17.5400  mIoU: 0.1200  mAcc: 0.6700  data_time: 0.0018  time: 0.1460

Relevant packages:

python                    3.9.16
pytorch                   1.13.1
cudatoolkit               11.6.0
mmcv                      2.0.0
mmsegmentation            1.0.0

I built mmsegmentation from source following the instructions in the repo.

Did anyone come across a similar issue? What might be the cause of it?

Jun 15 '23 17:06 shahaffind

I'm having the same problem with a custom backbone for Segformer that I obtained training with the standard configuration in the repo. Has anyone found a solution for this?

Mar 26 '24 15:03 manu15sd

I am also getting this error with fcn-unet, with a custom dataset conforming to the standard dataset format.

Edit: However mine is only binary segmentation with background and foreground classes.

Mar 26 '24 22:03 jacksteussie

I'm using the following workaround: As my custom backbone is trained with this framework, instead of starting a new training, I resume the training, changing the parameters—in my case, the training data and the iterations—and it works for me as a fine-tuning strategy.

Mar 27 '24 08:03 manu15sd

I'm using the following workaround: As my custom backbone is trained with this framework, instead of starting a new training, I resume the training, changing the parameters—in my case, the training data and the iterations—and it works for me as a fine-tuning strategy.

So just making it a fine-tuned model made the metrics change from zero?

Mar 27 '24 18:03 jacksteussie

Up, having a similar issue for linear segmentation. Validation mIoU always stays the same from the beginning of the training:

2025/07/28 11:37:02 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 67.9400 mIoU: 11.1100 mAcc: 14.8000 data_time: 0.0011 time: 0.0136 
2025/07/28 11:56:56 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 68.0500 mIoU: 11.1700 mAcc: 14.8300 data_time: 0.0009 time: 0.0132 
2025/07/28 12:16:56 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 67.9700 mIoU: 11.2400 mAcc: 14.9600 data_time: 0.0009 time: 0.0132 
2025/07/28 12:37:01 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 67.8400 mIoU: 11.0200 mAcc: 14.5700 data_time: 0.0009 time: 0.0135 
2025/07/28 12:57:11 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 68.0400 mIoU: 10.7700 mAcc: 14.3700 data_time: 0.0009 time: 0.0133

Jul 30 '25 08:07 KarahanS

backbone_norm_cfg = dict(eps=1e-06, requires_grad=True, type='LN')
checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segmenter/vit_small_p16_384_20220308-410f6037.pth'
crop_size = (
    512,
    512,
)
data_preprocessor = dict(
    bgr_to_rgb=True,
    mean=[
        123.675,
        116.28,
        103.53,
    ],
    pad_val=0,
    seg_pad_val=255,
    size=(
        512,
        512,
    ),
    std=[
        58.395,
        57.12,
        57.375,
    ],
    type='SegDataPreProcessor')
data_root = '/scratch/work/saritak1/datasets/ADEChallengeData2016'
dataset_type = 'ADE20KDataset'
default_hooks = dict(
    checkpoint=dict(by_epoch=False, interval=160000, type='CheckpointHook'),
    logger=dict(interval=50, log_metric_by_epoch=False, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(type='SegVisualizationHook'))
default_scope = 'mmseg'
env_cfg = dict(
    cudnn_benchmark=True,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
img_ratios = [
    0.5,
    0.75,
    1.0,
    1.25,
    1.5,
    1.75,
]
launcher = 'pytorch'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=False)
model = dict(
    backbone=dict(
        attn_drop_rate=0.0,
        drop_path_rate=0.1,
        drop_rate=0.0,
        embed_dims=384,
        final_norm=True,
        frozen_stages=12,
        img_size=(
            512,
            512,
        ),
        in_channels=3,
        init_cfg=dict(
            checkpoint=
            '/scratch/work/saritak1/checkpoints/dino_Li/converted.pth',
            type='Pretrained'),
        interpolate_mode='bicubic',
        norm_cfg=dict(eps=1e-06, requires_grad=True, type='LN'),
        num_heads=6,
        num_layers=12,
        out_indices=[
            11,
        ],
        patch_size=16,
        type='FreezableVisionTransformer',
        with_cls_token=True),
    data_preprocessor=dict(
        bgr_to_rgb=True,
        mean=[
            123.675,
            116.28,
            103.53,
        ],
        pad_val=0,
        seg_pad_val=255,
        size=(
            512,
            512,
        ),
        std=[
            58.395,
            57.12,
            57.375,
        ],
        type='SegDataPreProcessor'),
    decode_head=dict(
        channels=384,
        concat_input=False,
        dropout_ratio=0.0,
        in_channels=[
            384,
        ],
        in_index=[
            0,
        ],
        input_transform='resize_concat',
        loss_decode=dict(
            loss_weight=1.0, type='CrossEntropyLoss', use_sigmoid=False),
        num_classes=150,
        num_convs=0,
        type='FCNHead'),
    test_cfg=dict(crop_size=(
        512,
        512,
    ), mode='slide', stride=(
        480,
        480,
    )),
    type='EncoderDecoder')
optim_wrapper = dict(
    clip_grad=None,
    optimizer=dict(lr=0.0001, type='Adam', weight_decay=0.05),
    type='OptimWrapper')
optimizer = dict(lr=0.0001, type='Adam', weight_decay=0.0)
param_scheduler = [
    dict(
        begin=0,
        by_epoch=False,
        end=160000,
        eta_min=1e-05,
        power=0.9,
        type='PolyLR'),
]
resume = True
test_cfg = dict(type='TestLoop')
test_dataloader = dict(
    batch_size=1,
    dataset=dict(
        data_prefix=dict(
            img_path='images/validation',
            seg_map_path='annotations/validation'),
        data_root='/scratch/work/saritak1/datasets/ADEChallengeData2016',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(keep_ratio=True, scale=(
                2048,
                512,
            ), type='Resize'),
            dict(reduce_zero_label=True, type='LoadAnnotations'),
            dict(type='PackSegInputs'),
        ],
        type='ADE20KDataset'),
    num_workers=4,
    persistent_workers=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
    iou_metrics=[
        'mIoU',
    ], type='IoUMetric')
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(keep_ratio=True, scale=(
        2048,
        512,
    ), type='Resize'),
    dict(reduce_zero_label=True, type='LoadAnnotations'),
    dict(type='PackSegInputs'),
]
train_cfg = dict(
    max_iters=160000, type='IterBasedTrainLoop', val_interval=16000)
train_dataloader = dict(
    batch_size=16,
    dataset=dict(
        data_prefix=dict(
            img_path='images/training', seg_map_path='annotations/training'),
        data_root='/scratch/work/saritak1/datasets/ADEChallengeData2016',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(reduce_zero_label=True, type='LoadAnnotations'),
            dict(
                keep_ratio=True,
                ratio_range=(
                    0.5,
                    2.0,
                ),
                scale=(
                    2048,
                    512,
                ),
                type='RandomResize'),
            dict(
                cat_max_ratio=0.75, crop_size=(
                    512,
                    512,
                ), type='RandomCrop'),
            dict(prob=0.5, type='RandomFlip'),
            dict(type='PhotoMetricDistortion'),
            dict(type='PackSegInputs'),
        ],
        type='ADE20KDataset'),
    num_workers=8,
    persistent_workers=True,
    sampler=dict(shuffle=True, type='InfiniteSampler'))
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(reduce_zero_label=True, type='LoadAnnotations'),
    dict(
        keep_ratio=True,
        ratio_range=(
            0.5,
            2.0,
        ),
        scale=(
            2048,
            512,
        ),
        type='RandomResize'),
    dict(cat_max_ratio=0.75, crop_size=(
        512,
        512,
    ), type='RandomCrop'),
    dict(prob=0.5, type='RandomFlip'),
    dict(type='PhotoMetricDistortion'),
    dict(type='PackSegInputs'),
]
tta_model = dict(type='SegTTAModel')
tta_pipeline = [
    dict(backend_args=None, type='LoadImageFromFile'),
    dict(
        transforms=[
            [
                dict(keep_ratio=True, scale_factor=0.5, type='Resize'),
                dict(keep_ratio=True, scale_factor=0.75, type='Resize'),
                dict(keep_ratio=True, scale_factor=1.0, type='Resize'),
                dict(keep_ratio=True, scale_factor=1.25, type='Resize'),
                dict(keep_ratio=True, scale_factor=1.5, type='Resize'),
                dict(keep_ratio=True, scale_factor=1.75, type='Resize'),
            ],
            [
                dict(direction='horizontal', prob=0.0, type='RandomFlip'),
                dict(direction='horizontal', prob=1.0, type='RandomFlip'),
            ],
            [
                dict(type='LoadAnnotations'),
            ],
            [
                dict(type='PackSegInputs'),
            ],
        ],
        type='TestTimeAug'),
]
val_cfg = dict(type='ValLoop')
val_dataloader = dict(
    batch_size=1,
    dataset=dict(
        data_prefix=dict(
            img_path='images/validation',
            seg_map_path='annotations/validation'),
        data_root='/scratch/work/saritak1/datasets/ADEChallengeData2016',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(keep_ratio=True, scale=(
                2048,
                512,
            ), type='Resize'),
            dict(reduce_zero_label=True, type='LoadAnnotations'),
            dict(type='PackSegInputs'),
        ],
        type='ADE20KDataset'),
    num_workers=4,
    persistent_workers=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
    iou_metrics=[
        'mIoU',
    ], type='IoUMetric')
vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    name='visualizer',
    type='SegLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
    ])
work_dir = '/scratch/work/saritak1/segmentation/output_debug/dino_Li/lr_1e-4/vitb_linear_fcn_ade20k_lr1e-4_12_0.0001_1'

Jul 30 '25 08:07 KarahanS