End-to-End-Incremental-Learning
End-to-End-Incremental-Learning copied to clipboard
Performance
I cannot get the performance reported in the paper. The first model trained from stratch can only get averagely around 85% acc. when the incremental step is 10, where the reported value is near 90% in the paper. The performance of incremental learning is even worse. Based on my experience, the performance is sensitive to the parameters, e.g., the learning rate and augmentation strategies. If you find any mistakes or have suggestions, feel free to contact me. Thank you!
I think that the temperature T you've set and the distillation loss are different for the paper's,and the paper reports that when train a novel_class ,the Net will creat a new classification layer(CLi blocks in F1 from the paper),your code just use the same net(resnet) and train the parameters but don't add a new classification layer. it's my opinion, maybe wrong,hah
Actually, I have tried different temperature T. I tried to mute those classifiers (for classes haven't been learned), or dynamically add new classifiers to the network. But, the performance is still low.
Maybe you just change the value of T?,but the formua about the T in the paper is that raising pi and qi to the exponent 1/T,not just pi/T.And distillation loss in the paper is computed by pi and qi(modified versions), qi is the ground truth,but in your code,I find that ,the distillation loss is use the old classification logits and the new classification logits.,do you think it is the point?
Thank you! I think qi should be the old logits, even the description about qi is confusing in the paper. Assume qi is the ground truth, i.e., onehot labels, how can the knowledge transfer from the old model to the new model?
Maybe we can discuss it by email: bozhaonanjing @ gmail
Just wanted to know the current performance. @PatrickZH
Hi. I did some work and found some training tricks. But, recently, I am busy with something else. Maybe, I will update these things a few weeks later. Keep in contact!
Best regards, Bo Zhao
Hardik Chauhan [email protected] 于2019年8月3日周六 下午2:48写道:
Just wanted to know the current performance. @PatrickZH https://github.com/PatrickZH
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PatrickZH/End-to-End-Incremental-Learning/issues/1?email_source=notifications&email_token=ACNYTMREM4XFOQWF6OZ44J3QCWECRA5CNFSM4GXTIAK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3POZGA#issuecomment-517926040, or mute the thread https://github.com/notifications/unsubscribe-auth/ACNYTMXH37R3ZBMMKDJZVYLQCWECRANCNFSM4GXTIAKQ .
Could you please update the code? Make it up to the level of the paper report
I am busy with a new Continual Learning task now. I may update it after finishing that task. It may be 3 weeks later. Sorry for that.
Thanks for your contribution! Looking forward to your update!
Hi, does anybody know the definition of
in this paper's loss formula?
@PatrickZH Thanks for sharing code with us. I have several questions. Is pi in distillation the ground truth label or probability before update the weights? Do you achieve the same accuracy reported in paper?
- pdistij is the ground truth produced by the old model, while qdistij is produced by the new model.
- Not yet.
Best regards, Bo Zhao
On Sun, May 24, 2020 at 3:37 AM ninja [email protected] wrote:
@PatrickZH https://github.com/PatrickZH Thanks for sharing code with us. I have several questions. Is qi in distillation the ground truth label or probability before update the weights? Do you achieve the same accuracy reported in paper?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PatrickZH/End-to-End-Incremental-Learning/issues/1#issuecomment-633168288, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNYTMU2UGTORXJSFHSXOKLRTCB7FANCNFSM4GXTIAKQ .
Hi @PatrickZH, thanks for your great work on this. I'm curious about the way you alternate net and net_old variable in the process of forwarding input and passing it to the optimizer as following. Could you please explain it more?