Weird validation results
Hi, I'm having a weird issue with training a segmentation network. I'm using a custom pre-trained backbone (I can not disclose more information as it is part of a research paper we're working on) together with the UperNet segmentation head for ADE20K 160k iteration segmentation (based on existing configs for Convnext and Swin) with single-scale validation.
During training, everything ran fine. The losses (loss, decode.loss_ce, aux.loss_ce) are decreasing, and the accuracy (decode.acc_seg, aux.acc_sec) increases up to the area of around 70-90% after around 16k iterations. However, during validation, the results (again, at the 16k iterations mark) are as follows:
+---------------------+-------+-------+
| Class | IoU | Acc |
+---------------------+-------+-------+
| wall | 17.54 | 100.0 |
| building | 0.0 | 0.0 |
| sky | 0.0 | 0.0 |
| floor | 0.0 | 0.0 |
| tree | 0.0 | 0.0 |
| ceiling | 0.0 | 0.0 |
| road | 0.0 | 0.0 |
| bed | 0.0 | 0.0 |
| windowpane | 0.0 | 0.0 |
| grass | 0.0 | 0.0 |
| cabinet | 0.0 | 0.0 |
| sidewalk | 0.0 | 0.0 |
| person | 0.0 | 0.0 |
| earth | 0.0 | 0.0 |
| door | 0.0 | 0.0 |
| table | 0.0 | 0.0 |
| mountain | 0.0 | 0.0 |
| plant | 0.0 | 0.0 |
| curtain | 0.0 | 0.0 |
| chair | 0.0 | 0.0 |
| car | 0.0 | 0.0 |
| water | 0.0 | 0.0 |
| painting | 0.0 | 0.0 |
| sofa | 0.0 | 0.0 |
| shelf | 0.0 | 0.0 |
| house | 0.0 | 0.0 |
| sea | 0.0 | 0.0 |
| mirror | 0.0 | 0.0 |
| rug | 0.0 | 0.0 |
| field | 0.0 | 0.0 |
| armchair | 0.0 | 0.0 |
| seat | 0.0 | 0.0 |
| fence | 0.0 | 0.0 |
| desk | 0.0 | 0.0 |
| rock | 0.0 | 0.0 |
| wardrobe | 0.0 | 0.0 |
| lamp | 0.0 | 0.0 |
| bathtub | 0.0 | 0.0 |
| railing | 0.0 | 0.0 |
| cushion | 0.0 | 0.0 |
| base | 0.0 | 0.0 |
| box | 0.0 | 0.0 |
| column | 0.0 | 0.0 |
| signboard | 0.0 | 0.0 |
| chest of drawers | 0.0 | 0.0 |
| counter | 0.0 | 0.0 |
| sand | 0.0 | 0.0 |
| sink | 0.0 | 0.0 |
| skyscraper | 0.0 | 0.0 |
| fireplace | 0.0 | 0.0 |
| refrigerator | 0.0 | 0.0 |
| grandstand | 0.0 | 0.0 |
| path | 0.0 | 0.0 |
| stairs | 0.0 | 0.0 |
| runway | 0.0 | 0.0 |
| case | 0.0 | 0.0 |
| pool table | 0.0 | 0.0 |
| pillow | 0.0 | 0.0 |
| screen door | 0.0 | 0.0 |
| stairway | 0.0 | 0.0 |
| river | 0.0 | 0.0 |
| bridge | 0.0 | 0.0 |
| bookcase | 0.0 | 0.0 |
| blind | 0.0 | 0.0 |
| coffee table | 0.0 | 0.0 |
| toilet | 0.0 | 0.0 |
| flower | 0.0 | 0.0 |
| book | 0.0 | 0.0 |
| hill | 0.0 | 0.0 |
| bench | 0.0 | 0.0 |
| countertop | 0.0 | 0.0 |
| stove | 0.0 | 0.0 |
| palm | 0.0 | 0.0 |
| kitchen island | 0.0 | 0.0 |
| computer | 0.0 | 0.0 |
| swivel chair | 0.0 | 0.0 |
| boat | 0.0 | 0.0 |
| bar | 0.0 | 0.0 |
| arcade machine | 0.0 | 0.0 |
| hovel | 0.0 | 0.0 |
| bus | 0.0 | 0.0 |
| towel | 0.0 | 0.0 |
| light | 0.0 | 0.0 |
| truck | 0.0 | 0.0 |
| tower | 0.0 | 0.0 |
| chandelier | 0.0 | 0.0 |
| awning | 0.0 | 0.0 |
| streetlight | 0.0 | 0.0 |
| booth | 0.0 | 0.0 |
| television receiver | 0.0 | 0.0 |
| airplane | 0.0 | 0.0 |
| dirt track | 0.0 | 0.0 |
| apparel | 0.0 | 0.0 |
| pole | 0.0 | 0.0 |
| land | 0.0 | 0.0 |
| bannister | 0.0 | 0.0 |
| escalator | 0.0 | 0.0 |
| ottoman | 0.0 | 0.0 |
| bottle | 0.0 | 0.0 |
| buffet | 0.0 | 0.0 |
| poster | 0.0 | 0.0 |
| stage | 0.0 | 0.0 |
| van | 0.0 | 0.0 |
| ship | 0.0 | 0.0 |
| fountain | 0.0 | 0.0 |
| conveyer belt | 0.0 | 0.0 |
| canopy | 0.0 | 0.0 |
| washer | 0.0 | 0.0 |
| plaything | 0.0 | 0.0 |
| swimming pool | 0.0 | 0.0 |
| stool | 0.0 | 0.0 |
| barrel | 0.0 | 0.0 |
| basket | 0.0 | 0.0 |
| waterfall | 0.0 | 0.0 |
| tent | 0.0 | 0.0 |
| bag | 0.0 | 0.0 |
| minibike | 0.0 | 0.0 |
| cradle | 0.0 | 0.0 |
| oven | 0.0 | 0.0 |
| ball | 0.0 | 0.0 |
| food | 0.0 | 0.0 |
| step | 0.0 | 0.0 |
| tank | 0.0 | 0.0 |
| trade name | 0.0 | 0.0 |
| microwave | 0.0 | 0.0 |
| pot | 0.0 | 0.0 |
| animal | 0.0 | 0.0 |
| bicycle | 0.0 | 0.0 |
| lake | 0.0 | 0.0 |
| dishwasher | 0.0 | 0.0 |
| screen | 0.0 | 0.0 |
| blanket | 0.0 | 0.0 |
| sculpture | 0.0 | 0.0 |
| hood | 0.0 | 0.0 |
| sconce | 0.0 | 0.0 |
| vase | 0.0 | 0.0 |
| traffic light | 0.0 | 0.0 |
| tray | 0.0 | 0.0 |
| ashcan | 0.0 | 0.0 |
| fan | 0.0 | 0.0 |
| pier | 0.0 | 0.0 |
| crt screen | 0.0 | 0.0 |
| plate | 0.0 | 0.0 |
| monitor | 0.0 | 0.0 |
| bulletin board | 0.0 | 0.0 |
| shower | 0.0 | 0.0 |
| radiator | 0.0 | 0.0 |
| glass | 0.0 | 0.0 |
| clock | 0.0 | 0.0 |
| flag | 0.0 | 0.0 |
+---------------------+-------+-------+
06/15 20:19:41 - mmengine - INFO - Iter(val) [500/500] aAcc: 17.5400 mIoU: 0.1200 mAcc: 0.6700 data_time: 0.0018 time: 0.1460
Relevant packages:
python 3.9.16
pytorch 1.13.1
cudatoolkit 11.6.0
mmcv 2.0.0
mmsegmentation 1.0.0
I built mmsegmentation from source following the instructions in the repo.
Did anyone come across a similar issue? What might be the cause of it?
I'm having the same problem with a custom backbone for Segformer that I obtained training with the standard configuration in the repo. Has anyone found a solution for this?
I am also getting this error with fcn-unet, with a custom dataset conforming to the standard dataset format.
Edit: However mine is only binary segmentation with background and foreground classes.
I'm using the following workaround: As my custom backbone is trained with this framework, instead of starting a new training, I resume the training, changing the parameters—in my case, the training data and the iterations—and it works for me as a fine-tuning strategy.
I'm using the following workaround: As my custom backbone is trained with this framework, instead of starting a new training, I resume the training, changing the parameters—in my case, the training data and the iterations—and it works for me as a fine-tuning strategy.
So just making it a fine-tuned model made the metrics change from zero?
Up, having a similar issue for linear segmentation. Validation mIoU always stays the same from the beginning of the training:
2025/07/28 11:37:02 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 67.9400 mIoU: 11.1100 mAcc: 14.8000 data_time: 0.0011 time: 0.0136
2025/07/28 11:56:56 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 68.0500 mIoU: 11.1700 mAcc: 14.8300 data_time: 0.0009 time: 0.0132
2025/07/28 12:16:56 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 67.9700 mIoU: 11.2400 mAcc: 14.9600 data_time: 0.0009 time: 0.0132
2025/07/28 12:37:01 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 67.8400 mIoU: 11.0200 mAcc: 14.5700 data_time: 0.0009 time: 0.0135
2025/07/28 12:57:11 - mmengine - INFO - Iter(val) [2000/2000] aAcc: 68.0400 mIoU: 10.7700 mAcc: 14.3700 data_time: 0.0009 time: 0.0133
backbone_norm_cfg = dict(eps=1e-06, requires_grad=True, type='LN')
checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segmenter/vit_small_p16_384_20220308-410f6037.pth'
crop_size = (
512,
512,
)
data_preprocessor = dict(
bgr_to_rgb=True,
mean=[
123.675,
116.28,
103.53,
],
pad_val=0,
seg_pad_val=255,
size=(
512,
512,
),
std=[
58.395,
57.12,
57.375,
],
type='SegDataPreProcessor')
data_root = '/scratch/work/saritak1/datasets/ADEChallengeData2016'
dataset_type = 'ADE20KDataset'
default_hooks = dict(
checkpoint=dict(by_epoch=False, interval=160000, type='CheckpointHook'),
logger=dict(interval=50, log_metric_by_epoch=False, type='LoggerHook'),
param_scheduler=dict(type='ParamSchedulerHook'),
sampler_seed=dict(type='DistSamplerSeedHook'),
timer=dict(type='IterTimerHook'),
visualization=dict(type='SegVisualizationHook'))
default_scope = 'mmseg'
env_cfg = dict(
cudnn_benchmark=True,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
img_ratios = [
0.5,
0.75,
1.0,
1.25,
1.5,
1.75,
]
launcher = 'pytorch'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=False)
model = dict(
backbone=dict(
attn_drop_rate=0.0,
drop_path_rate=0.1,
drop_rate=0.0,
embed_dims=384,
final_norm=True,
frozen_stages=12,
img_size=(
512,
512,
),
in_channels=3,
init_cfg=dict(
checkpoint=
'/scratch/work/saritak1/checkpoints/dino_Li/converted.pth',
type='Pretrained'),
interpolate_mode='bicubic',
norm_cfg=dict(eps=1e-06, requires_grad=True, type='LN'),
num_heads=6,
num_layers=12,
out_indices=[
11,
],
patch_size=16,
type='FreezableVisionTransformer',
with_cls_token=True),
data_preprocessor=dict(
bgr_to_rgb=True,
mean=[
123.675,
116.28,
103.53,
],
pad_val=0,
seg_pad_val=255,
size=(
512,
512,
),
std=[
58.395,
57.12,
57.375,
],
type='SegDataPreProcessor'),
decode_head=dict(
channels=384,
concat_input=False,
dropout_ratio=0.0,
in_channels=[
384,
],
in_index=[
0,
],
input_transform='resize_concat',
loss_decode=dict(
loss_weight=1.0, type='CrossEntropyLoss', use_sigmoid=False),
num_classes=150,
num_convs=0,
type='FCNHead'),
test_cfg=dict(crop_size=(
512,
512,
), mode='slide', stride=(
480,
480,
)),
type='EncoderDecoder')
optim_wrapper = dict(
clip_grad=None,
optimizer=dict(lr=0.0001, type='Adam', weight_decay=0.05),
type='OptimWrapper')
optimizer = dict(lr=0.0001, type='Adam', weight_decay=0.0)
param_scheduler = [
dict(
begin=0,
by_epoch=False,
end=160000,
eta_min=1e-05,
power=0.9,
type='PolyLR'),
]
resume = True
test_cfg = dict(type='TestLoop')
test_dataloader = dict(
batch_size=1,
dataset=dict(
data_prefix=dict(
img_path='images/validation',
seg_map_path='annotations/validation'),
data_root='/scratch/work/saritak1/datasets/ADEChallengeData2016',
pipeline=[
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
512,
), type='Resize'),
dict(reduce_zero_label=True, type='LoadAnnotations'),
dict(type='PackSegInputs'),
],
type='ADE20KDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
iou_metrics=[
'mIoU',
], type='IoUMetric')
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
512,
), type='Resize'),
dict(reduce_zero_label=True, type='LoadAnnotations'),
dict(type='PackSegInputs'),
]
train_cfg = dict(
max_iters=160000, type='IterBasedTrainLoop', val_interval=16000)
train_dataloader = dict(
batch_size=16,
dataset=dict(
data_prefix=dict(
img_path='images/training', seg_map_path='annotations/training'),
data_root='/scratch/work/saritak1/datasets/ADEChallengeData2016',
pipeline=[
dict(type='LoadImageFromFile'),
dict(reduce_zero_label=True, type='LoadAnnotations'),
dict(
keep_ratio=True,
ratio_range=(
0.5,
2.0,
),
scale=(
2048,
512,
),
type='RandomResize'),
dict(
cat_max_ratio=0.75, crop_size=(
512,
512,
), type='RandomCrop'),
dict(prob=0.5, type='RandomFlip'),
dict(type='PhotoMetricDistortion'),
dict(type='PackSegInputs'),
],
type='ADE20KDataset'),
num_workers=8,
persistent_workers=True,
sampler=dict(shuffle=True, type='InfiniteSampler'))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(reduce_zero_label=True, type='LoadAnnotations'),
dict(
keep_ratio=True,
ratio_range=(
0.5,
2.0,
),
scale=(
2048,
512,
),
type='RandomResize'),
dict(cat_max_ratio=0.75, crop_size=(
512,
512,
), type='RandomCrop'),
dict(prob=0.5, type='RandomFlip'),
dict(type='PhotoMetricDistortion'),
dict(type='PackSegInputs'),
]
tta_model = dict(type='SegTTAModel')
tta_pipeline = [
dict(backend_args=None, type='LoadImageFromFile'),
dict(
transforms=[
[
dict(keep_ratio=True, scale_factor=0.5, type='Resize'),
dict(keep_ratio=True, scale_factor=0.75, type='Resize'),
dict(keep_ratio=True, scale_factor=1.0, type='Resize'),
dict(keep_ratio=True, scale_factor=1.25, type='Resize'),
dict(keep_ratio=True, scale_factor=1.5, type='Resize'),
dict(keep_ratio=True, scale_factor=1.75, type='Resize'),
],
[
dict(direction='horizontal', prob=0.0, type='RandomFlip'),
dict(direction='horizontal', prob=1.0, type='RandomFlip'),
],
[
dict(type='LoadAnnotations'),
],
[
dict(type='PackSegInputs'),
],
],
type='TestTimeAug'),
]
val_cfg = dict(type='ValLoop')
val_dataloader = dict(
batch_size=1,
dataset=dict(
data_prefix=dict(
img_path='images/validation',
seg_map_path='annotations/validation'),
data_root='/scratch/work/saritak1/datasets/ADEChallengeData2016',
pipeline=[
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
512,
), type='Resize'),
dict(reduce_zero_label=True, type='LoadAnnotations'),
dict(type='PackSegInputs'),
],
type='ADE20KDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
iou_metrics=[
'mIoU',
], type='IoUMetric')
vis_backends = [
dict(type='LocalVisBackend'),
]
visualizer = dict(
name='visualizer',
type='SegLocalVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
])
work_dir = '/scratch/work/saritak1/segmentation/output_debug/dino_Li/lr_1e-4/vitb_linear_fcn_ade20k_lr1e-4_12_0.0001_1'