LWF The implementation of MultiClassCrossEntropy seems incorrect

Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:

The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor
In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.

Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!

Dec 28 '21 02:12 b224618

Have you got an

Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:

The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor

In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.

Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!

Have you got any new ideas?

Apr 15 '22 07:04 cht619

Have you got an

Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:

The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor

In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.

Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!

Have you got any new ideas?

My first question is a misunderstanding, but I still belive the second one is indeed a problem

Oct 06 '22 07:10 b224618

@b224618 Why your first question is a misunderstanding ?

Feb 24 '24 12:02 NguyenQuangMinh0504

LWF LWF copied to clipboard

The implementation of MultiClassCrossEntropy seems incorrect

LWF
LWF copied to clipboard