TorchDistiller
TorchDistiller copied to clipboard
KLD in Object Detection
Hi, irfanlCMLL.
Thanks for kindly sharing this repository. In my training phase, I checked that the CWD losses (cwd_0, ..., cwd_3) didn't decrease even they have very high values. I assume that it can be very hard to decrease the KL divergence loss. (I did experiments on this repo: https://github.com/pppppM/mmdetection-distiller)
Do you agree with my assumption? How about other losses (L2, L1, Cosine similarity) for CWD?