Bowen Zheng
Bowen Zheng
> OK, I get your point, you mean mathmatically .sum(1) is the correct implementation and .mean(1)=.sum(1)/16 That's true, but how is it related to batchmean? BTW, I also found that...
FYI: I record the factor ratio `avg_factor/(self.reg_max+1)` during the training. Maybe it will help this discussion.
> It's a intended behavior because experiment shows not dividing is better. Don't know the theory behind this though I see, thanks for the reply.
> The hyperparameters should be the same between different teacher-student pairs. We simply reported the results in CRD's original paper. @Zzzzz1 Thanks for the replay, this is very helpful. I...
@winycg I think you can revert commit 31ae0d6, and the conflict is resolved.