PGD
PGD copied to clipboard
Distilling not working with custom dataset
I prepared a custom coco dataset, and successfully trained a teacher model and a student model. here are their performance:
# teacher
# saved to work_dirs/atss_r50_1x_TaiZhou/epoch_last.pth
bbox_mAP: 0.4180, bbox_mAP_50: 0.6940, bbox_mAP_75: 0.4230, bbox_mAP_s: 0.1610, bbox_mAP_m: 0.4310, bbox_mAP_l: 0.6460
# student
# saved to work_dirs/atss_r101_3x_ms_TaiZhou/epoch_last.pth
bbox_mAP: 0.3590, bbox_mAP_50: 0.6540, bbox_mAP_75: 0.3360, bbox_mAP_s: 0.1450, bbox_mAP_m: 0.4020, bbox_mAP_l: 0.5570
however I tried to distill using the above teacher and student;
CUDA_VISIBLE_DEVICES=4,5,6,7 bash tools/dist_train.sh work_configs/pgd_atss_r101_r50_1x_TaiZhou.py 4 --work-dir work_dirs/dist_pgd_atss_r101_r50_1x_TaiZhou
the student wasn't getting better at all, whose performance remained zero, and was not increasing with epoch going on.
My best guess is the cfg file was wrong:
```yaml
# work_configs/pgd_atss_r101_r50_1x_TaiZhou.py
_base_ = "base/1x_setting.py"
temperature = 0.8
alpha = 0.08
delta = 0.0008
beta = alpha * 0.5
gamma = alpha * 1.6
fp16 = dict(loss_scale=512.)
dataset_type = 'MyCocoDataset'
data_root = 'data/MyCoco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True
)
img_resize = (640, 640)
classes = (...) # 8 classes
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='Resize',
# img_scale=(1333, 800),
img_scale=img_resize,
keep_ratio=True
),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
# img_scale=(1333, 800),
img_scale=img_resize,
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=8,
workers_per_gpu=2,
train=dict(
type=dataset_type,
classes=classes,
ann_file=data_root + 'annotations/instances_train.json',
img_prefix=data_root + 'train/',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
classes=classes,
ann_file=data_root + 'annotations/instances_val.json',
img_prefix=data_root + 'val/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
classes=classes,
ann_file=data_root + 'annotations/instances_val.json',
img_prefix=data_root + 'val/',
pipeline=test_pipeline)
)
distiller = dict(
type='PredictionGuidedDistiller',
teacher_pretrained = 'work_dirs/atss_r101_3x_ms_TaiZhou/epoch_12.pth',
init_student = True,
distill_cfg=[
# this part was not edited
]
)
# I'm sure student_cfg and teacher_cfg use the aforementioned .pth weights
student_cfg = 'work_configs/detectors/atss_r50_distill_head_TaiZhou.py'
teacher_cfg = 'work_configs/detectors/atss_r101_3x_ms_TaiZhou.py'