SegNeXt
SegNeXt copied to clipboard
Encoder-Decoder downsample 1/8x, is too coarse to produce 'seg_logits'
I have the original input feature 830(H)x1280(W), but find seg_logits is downsampled to 1024(Channel)x104(H)x160(W) feature map in ham_head. It's too coarse.
你可以看到 在分割时,动物的边界不是很清晰。这可能是降采太多导致。希望提供指导。
Plus, you will see too many background occuping the image, which is hard to optimize the other class-segmentation. How to optimize the model to overcome this issue. For example the class_weight
?
# tools/dist_train.sh segnext.large.ratmetric.py 4
# python tools/train.py segnext.large.ratmetric.py
_base_ = [
'local_configs/segnext/large/segnext.large.512x512.coco_stuff164k.80k.py'
]
num_classes = 3
# load_from = None
load_from = 'work_dirs/segnext.large.ratmetric/latest.pth'
model = dict(
backbone=dict(init_cfg=dict(type='Pretrained', checkpoint='pretrained/segnext_large_512x512_ade_160k.pth')),
decode_head=dict(
num_classes=num_classes,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, class_weight=[1.0/50, 1.0, 1.0], loss_weight=1.0))
)
runner = dict(type='IterBasedRunner', max_iters=6400)
checkpoint_config = dict(by_epoch=False, interval=800)
evaluation = dict(interval=800, metric='mIoU')
data_root = 'data_rat_metric'
img_dir='images'
ann_dir='annotations'
img_wh = (1280,832)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', reduce_zero_label=True),
dict(type='Resize', img_scale=img_wh, ratio_range=(0.7, 1.5)),
dict(type='RandomCrop', crop_size=img_wh[::-1], cat_max_ratio=1.0, ignore_index=0),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size=img_wh[::-1], pad_val=0, seg_pad_val=0),
# dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=img_wh,
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=4,
train=dict(
type='COCOStuffDatasetRat',
data_root=data_root,
img_dir=img_dir,
ann_dir=ann_dir,
pipeline=train_pipeline),
val=dict(
type='COCOStuffDatasetRat',
data_root=data_root,
img_dir=img_dir,
ann_dir=ann_dir,
pipeline=test_pipeline),
test=dict(
type='COCOStuffDatasetRat',
data_root=data_root,
img_dir=img_dir,
ann_dir=ann_dir,
pipeline=test_pipeline))
You can use class_weight to solve the sample imbalance problem, or use OHEM. And you can change in_index by use [0,1,2,3]
OK, i see the imbalance solution. And how to change the feature map size in ham_head? I think it's too coarse.
This question I also meet, a simple way to solve this is to modify the source on the downsampling location,