LWF
LWF copied to clipboard
The implementation of MultiClassCrossEntropy seems incorrect
Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:
- The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor
- In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.
Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!
Have you got an
Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:
- The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor
- In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.
Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!
Have you got any new ideas?
Have you got an
Hi, thanks a lot for sharing this implementation. But I found the implementation of MultiClassCrossEntropy confusing. The things are two-fold:
- The computation of the output and labels. In the original paper of LwF, the (1/T) is on the power instead of being as a multiplicative factor
- In the 'return Variable(outputs.data, requires_grad=True).cuda()' line, this new Variable losses the computaion for computing output and optimizing this output will not updatet the model. As a result, the dist_loss in line 155 of model.py does not affect the results and the performance is same after removing this term.
Apologize if there is any mistake in my understanding, and thanks again for sharing the implementation!
Have you got any new ideas?
My first question is a misunderstanding, but I still belive the second one is indeed a problem
@b224618 Why your first question is a misunderstanding ?