RepDistiller icon indicating copy to clipboard operation
RepDistiller copied to clipboard

questions about ContrastMemory

Open jianxiangm opened this issue 4 years ago • 3 comments

Hi, according to Eq.19 in the paper, linear transform gT and gS are conducted on the teacher and student, respectively, i.e., gT(t), gS(s).

But as for your codes, the teacher transform gT is applied on the student feature, gT(s) , and the student transform gS is applied on the teacher feature, gS(t), like out_v2 = torch.bmm(weight_v1, v2.view(batchSize, inputSize, 1)) out_v2 = torch.exp(torch.div(out_v2, T)) out_v1 = torch.bmm(weight_v2, v1.view(batchSize, inputSize, 1)) out_v1 = torch.exp(torch.div(out_v1, T))

and thus your contrast loss changes to be the addition of ContrastLoss(out_v1) + ContrastLoss(out_v2).

I wonder why you did this , instead of calculating output like Eq.19 by gT(t)*gS(s)/t and ContrastLoss(out).

Thanks.

jianxiangm avatar May 11 '20 05:05 jianxiangm

I had the same question, as I understand, ContrastLoss(out_v2) will not have any gradients given that the teacher is not being trained.

yassouali avatar Jun 22 '20 19:06 yassouali

I had the same question, as I understand, ContrastLoss(out_v2) will not have any gradients given that the teacher is not being trained.

the last fc layer is being trained in the teacher.

jianxiangm avatar Jun 23 '20 08:06 jianxiangm

Yes you are right, thanks.

yassouali avatar Jun 23 '20 17:06 yassouali