RepDistiller
RepDistiller copied to clipboard
Problem of the order of the normalization in Similarity-Preserving loss.
In the paper for Similarity-Preserving loss. The normalization is before the operation of matrix Multiplication. Does the order matter the performance.
import torch
org_f_s = torch.rand((64, 96))
org_f_t = torch.rand((64, 96))
bsz = f_s.shape[0]
f_s = org_f_s.view(bsz, -1)
f_t = org_f_t.view(bsz, -1)
G_s = torch.mm(f_s, torch.t(f_s))
# G_s = G_s / G_s.norm(2)
G_s = torch.nn.functional.normalize(G_s)
G_t = torch.mm(f_t, torch.t(f_t))
# G_t = G_t / G_t.norm(2)
G_t = torch.nn.functional.normalize(G_t)
G_diff = G_t - G_s
loss = (G_diff * G_diff).view(-1, 1).sum(0) / (bsz * bsz)
print(loss)
f_s = org_f_s.view(bsz, -1)
f_t = org_f_t.view(bsz, -1)
f_s = torch.nn.functional.normalize(f_s)
G_s = torch.mm(f_s, torch.t(f_s))
# G_s = G_s / G_s.norm(2)
f_t = torch.nn.functional.normalize(f_t)
G_t = torch.mm(f_t, torch.t(f_t))
# G_t = G_t / G_t.norm(2)
G_diff = G_t - G_s
loss = (G_diff * G_diff).view(-1, 1).sum(0) / (bsz * bsz)
print(loss)