Efficient-Computing
                                
                                
                                
                                    Efficient-Computing copied to clipboard
                            
                            
                            
                        Why is the cwd_loss is 0 when I use your self-distill commond?
I follow your work in Gold-yolo, and use the commond to train a teacher model ,then use the self-distill commond to train student model. But the cwd_loss is 0, as is shown below!
Can you give me more details, like your model config and data yaml. Does this affect the mIoU of the model.
the model is gold-yolo-s, dataset is coco2017.  I have known why the cwd_loss is 0 (beacuse without using --distill_feat), but I still want to know why the cls_loss is negativt? And the self-distill performance seems to be lower.
the model is gold-yolo-s, dataset is coco2017. I have known why the cwd_loss is 0 (beacuse without using --distill_feat), but I still want to know why the cls_loss is negativt? And the self-distill performance seems to be lower.
i have the same situation. the self-distill is lower than origin.
Can you give me more details, like your model config and data yaml. Does this affect the mIoU of the model.
if using both the --distill and --distill_feat are better than just the --distill
the model is gold-yolo-s, dataset is coco2017. I have known why the cwd_loss is 0 (beacuse without using --distill_feat), but I still want to know why the cls_loss is negativt? And the self-distill performance seems to be lower.
Please give specific training commands and model config.
This may be due to the teacher's checkpoints not being aligned with the model, resulting in the error label generated by the teacher. You can set strict=True in here  load_state_dict to check your teacher model checkpoint  are correct.
What is the epoch number when training for Teacher and Student model?
300 and 300
 the model is gold-yolo-s, dataset is coco2017. I have known why the cwd_loss is 0 (beacuse without using --distill_feat), but I still want to know why the cls_loss is negativt? And the self-distill performance seems to be lower.