MIC
MIC copied to clipboard
Teacher model for inference
Hi. I have a question about inference. As I understand, teacher model is more robust than student model by EMA update and it is used for supervision(pseudo label). Then why do you use student model for inference? What is the problem if we use teacher model for inference?
Hi @kimkj38,
I did not evaluate the teacher performance for MIC. However, in previous UDA studies, we found that the student and teacher have similar performance at the end of the training. The EMA teacher is particularly important in the beginning of the training, when the network is learning rather quickly and the predictions change quickly, to ensure temporally stable pseudo-labels for stable self-training. When the training converges in the end and the learning rate is decayed, there is less instability and both student and teacher converge to the similar predictions.
Best, Lukas