RepDistiller
RepDistiller copied to clipboard
How do you choose the optimal hyper-parameters?
There are several hyper parameters existing:
- teacher model hyper parameters
- student model hyper parameters
- KD hyper parameters (e.g., balance weight for different losses)
- Training hyper parameters (e.g., learning rate)
It is hard to enumerate for every combination, because it may explode. How do you find the best (or suboptimal) hyper parameter?
Thanks!
same question, just do not know how to set weight for different loss
Hi, were you able to figure out a good set of hyperparameters?