Efficient-Computing Why is the cwd_loss is 0 when I use your self-distill commond?

Why is the cwd_loss is 0 when I use your self-distill commond?

Open jianganghuang opened this issue 2 years ago • 8 comments

I follow your work in Gold-yolo, and use the commond to train a teacher model ,then use the self-distill commond to train student model. But the cwd_loss is 0, as is shown below!

Oct 21 '23 11:10 jianganghuang

Can you give me more details, like your model config and data yaml. Does this affect the mIoU of the model.

Oct 23 '23 06:10 lose4578

the model is gold-yolo-s, dataset is coco2017. I have known why the cwd_loss is 0 (beacuse without using --distill_feat), but I still want to know why the cls_loss is negativt? And the self-distill performance seems to be lower.

Oct 23 '23 08:10 jianganghuang

the model is gold-yolo-s, dataset is coco2017. I have known why the cwd_loss is 0 (beacuse without using --distill_feat), but I still want to know why the cls_loss is negativt? And the self-distill performance seems to be lower.

i have the same situation. the self-distill is lower than origin.

Oct 23 '23 12:10 kaka-Cao

Can you give me more details, like your model config and data yaml. Does this affect the mIoU of the model.

if using both the --distill and --distill_feat are better than just the --distill

Oct 23 '23 12:10 kaka-Cao

the model is gold-yolo-s, dataset is coco2017. I have known why the cwd_loss is 0 (beacuse without using --distill_feat), but I still want to know why the cls_loss is negativt? And the self-distill performance seems to be lower.

Please give specific training commands and model config.

Oct 24 '23 06:10 lose4578

This may be due to the teacher's checkpoints not being aligned with the model, resulting in the error label generated by the teacher. You can set strict=True in here load_state_dict to check your teacher model checkpoint are correct.

Oct 24 '23 06:10 lose4578

What is the epoch number when training for Teacher and Student model?

Oct 24 '23 10:10 jianganghuang

300 and 300

Jan 09 '24 03:01 lose4578

Efficient-Computing Efficient-Computing copied to clipboard

Why is the cwd_loss is 0 when I use your self-distill commond?

Efficient-Computing
Efficient-Computing copied to clipboard