DenseTeacher icon indicating copy to clipboard operation
DenseTeacher copied to clipboard

coco-p1 training divergent

Open HaojieYuu opened this issue 2 years ago • 8 comments

I tried to reproduce results under coco-p1 configuration, but training divergent after 40k steps and I got only 14% mAP which is far lower than 19.64%. Could you help me, please

HaojieYuu avatar Sep 16 '22 07:09 HaojieYuu

Training with such small amout of supervision is sensitive to hyper-parameters, please try batch 8 and logits weight 3

ZRandomize avatar Sep 20 '22 03:09 ZRandomize

just corrected the config in latest commit

ZRandomize avatar Sep 20 '22 03:09 ZRandomize

Thanks for the reply, I will try the latest code

HaojieYuu avatar Sep 20 '22 03:09 HaojieYuu

I just tried the latest config, and I add IMS_PER_DEVICE=1 to avoid the below assert. def adjust_config(cfg): base_world_size = int(cfg.SOLVER.IMS_PER_BATCH / cfg.SOLVER.IMS_PER_DEVICE) # Batchsize, learning rate and max_iter in original config is used for 8 GPUs assert base_world_size == 8, "IMS_PER_BATCH/DEVICE in config file is used for 8 GPUs"

But the training still diverged after about 40k steps. I got higher result 16% mAP, but it's still much lower than 19.64%. I notice that coco-p1 don't use multiple-scale training, will that influence the final result?

HaojieYuu avatar Sep 23 '22 04:09 HaojieYuu

Indeed... Thanks for correction, I'll fix it. The multi-scale training would affect performance a lot, please use SUPERVISED=(WeakAug,dict(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style="choice")), to align with previous works like Unbiased Teacher; or SUPERVISED=(WeakAug,dict(short_edge_length=(640, 800), max_size=1333, sample_style="range")), for higher performance

ZRandomize avatar Sep 23 '22 05:09 ZRandomize

Thanks for your reply. I get 18.49 mAP now, but it's still 1 point lower than the score presented in paper(19.64±0.34). Can this fluctuation in the result be considered normal? Besides, I noticed that both the student model and the teacher model are evaluated twice, but I can't find where the problem is. This problem can be reproduced with the latest code, could you please help?

HaojieYuu avatar Oct 12 '22 07:10 HaojieYuu

seems its close, our curve looks like this: image We make the model evaluate both teacher and student every 2k iter, and we report the performance of the teacher

ZRandomize avatar Oct 12 '22 07:10 ZRandomize

Thanks for your detailed reply! In my situation, the inference is carried out 4 times every 2k iter, not 2 times. Both the teacher and student models are evaluated twice which is bizarre. I didn't modify the code, could you reproduce this problem with the official code?

HaojieYuu avatar Oct 12 '22 07:10 HaojieYuu