CrossKD
CrossKD copied to clipboard
how BNs in the teacher model are handled
Hi, while the teacher model is frozen, how BNs in the teacher model are handled:
- BNs use the data batch statistics? i.e., training mode but with no grad
- BNs use the running statistics? i.e., eval mode
- BNs in the backbone and head, are they treated the same as the BNs in the heads (which also process the student features).
thanks!