RepDistiller questions about ContrastMemory

questions about ContrastMemory

Open jianxiangm opened this issue 4 years ago • 3 comments

Hi, according to Eq.19 in the paper, linear transform gT and gS are conducted on the teacher and student, respectively, i.e., gT(t), gS(s).

But as for your codes, the teacher transform gT is applied on the student feature, gT(s) , and the student transform gS is applied on the teacher feature, gS(t), like out_v2 = torch.bmm(weight_v1, v2.view(batchSize, inputSize, 1)) out_v2 = torch.exp(torch.div(out_v2, T)) out_v1 = torch.bmm(weight_v2, v1.view(batchSize, inputSize, 1)) out_v1 = torch.exp(torch.div(out_v1, T))

and thus your contrast loss changes to be the addition of ContrastLoss(out_v1) + ContrastLoss(out_v2).

I wonder why you did this , instead of calculating output like Eq.19 by gT(t)*gS(s)/t and ContrastLoss(out).

Thanks.

May 11 '20 05:05 jianxiangm

I had the same question, as I understand, ContrastLoss(out_v2) will not have any gradients given that the teacher is not being trained.

Jun 22 '20 19:06 yassouali

I had the same question, as I understand, ContrastLoss(out_v2) will not have any gradients given that the teacher is not being trained.

the last fc layer is being trained in the teacher.

Jun 23 '20 08:06 jianxiangm

Yes you are right, thanks.

Jun 23 '20 17:06 yassouali

RepDistiller RepDistiller copied to clipboard

questions about ContrastMemory

RepDistiller
RepDistiller copied to clipboard