LWF icon indicating copy to clipboard operation
LWF copied to clipboard

The implementation of MultiClassCrossEntropy seems incorrect

Open b224618 opened this issue 3 years ago • 3 comments

Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:

  1. The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor
  2. In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.

Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!

b224618 avatar Dec 28 '21 02:12 b224618

Have you got an

Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:

  1. The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor
  2. In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.

Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!

Have you got any new ideas?

cht619 avatar Apr 15 '22 07:04 cht619

Have you got an

Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:

  1. The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor
  2. In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.

Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!

Have you got any new ideas?

My first question is a misunderstanding, but I still belive the second one is indeed a problem

b224618 avatar Oct 06 '22 07:10 b224618

@b224618 Why your first question is a misunderstanding ?

NguyenQuangMinh0504 avatar Feb 24 '24 12:02 NguyenQuangMinh0504