TorchDistiller
TorchDistiller copied to clipboard
Is it better to combine CWD loss with other losses than just CWD loss?
Hi!
As mentioned in README: To train a model with channel-wise distillation, GAN loss and Pixel-wise distillation.
Is it better to combine CWD loss with other losses than just CWD loss?