mmrazor
mmrazor copied to clipboard
[Bug] Custom Distillation MMSeg CWD loss nan problem
Describe the bug
I am training the segnext_l model as a standard teacher on my data and keeping the checkpoint obtained for distillation(mmseg/cwd) from segnext_l--->segnext_tiny. When doing this after starting few iterations i am getting all losses as nan for all upcoming iterations.
I am using all the latest versions .
The results of student model also remains 0.