agem
agem copied to clipboard
Reproducing experiments "On Tiny Episodic Memories in Continual Learning"
Hi,
I tried to reproduce your results that you described in section 5.5. of your paper "On Tiny Episodic Memories in Continual Learning" because I couldn't find an implementation in your codebase and the experiment seem relatively easy to reproduce. I'm mostly interested in the results for 20-degrees rotation, where fine-tuning on the second task does not harm performance on the first one, so in fact I am only interested to reproduce this figure:
I've skimmed the paper and listed the following hyperparameters:
- MLP with 2 hidden layers, 256 units each, followed by ReLU
- SGD with lr=0.1
- CrossEntropy loss
- Minibatch size=10
- A single pass through the whole dataset
Unfortunately, after reproducing the experiments I found that after finishing the first task my network has 96% accuracy on test set in contrast to 85% that you reported and finetuining only on the second task indeed leads to catastrophic forgetting (which is not so catastrophic in this case, but leads to the loss of ~5% of accuracy on the test set).
Could you please provide me any details about your experimental setup? Am I missing something?