WENO Question of reproducing the results on camelyon16

Thx for your great work!

I followed the instructions written in the paper and its appendix to reproduce the results on camelyon16. However, I observed that its training was hard to converge (the loss starts to decrease quickly at about 100-th epoch). Caused by this, possibly I guess, I also found that the bag-level AUC of teacher was only about 0.6, whereas the instance-level AUC of student was extremely high (0.94).

I am not sure whether I used the same hyper-parameters as the paper. So, could you provide the full hyper-parameters setting of training camelyon16?

Nov 11 '22 03:11 liupei101

Through experiments, I found that the issues above were largely due to

Batch size = 1
a simple SGD optimizer.

(as provided in the source code of this repo)

I changed them as follows.

Batch size = 4, realized by gradient accumulation.
Adam optimizer.

All things became more reasonable.

Nov 11 '22 03:11 liupei101

Thanks for your attention and your contribution!

Mar 18 '23 02:03 HardworkingLittlequ