shampoo.pytorch icon indicating copy to clipboard operation
shampoo.pytorch copied to clipboard

Problem about Cifar10 Expriements Reproduction

Open cwfxcz opened this issue 6 years ago • 2 comments

Hi there, Thanks for your great work of shampoo implementation in Pytorch. I'm trying to reproduce the cifar10 results in the Shampoo paper. But I got a much lower testing results. I have tried changing the learning rate form 0.01 to 10(according to the paper suggests), but still got a near 85% acc. Here are my experiments results:

  • We use the Resnet32 network in Cifar10 experiments.
  • --momentum, 0.9 --epsilon, 1e-4 --batchSize, 128

lr=0.1:(250 epochs)

Training Loss Training Acc Testing loss Testing Acc
0.65 77.03% 0.68 76.39%

lr=1: (250 epochs)

Training Loss Training Acc Testing loss Testing Acc
0.25 91.33% 0.57 84.04%

lr=2: (250 epochs)

Training Loss Training Acc Testing loss Testing Acc
0.23 91.87% 0.72 82.02%

lr=5: (250 epochs)

Training Loss Training Acc Testing loss Testing Acc
0.22 92.33% 0.75 82.04%

When training for 500 epochs for different lr above, the testing acc ramains almost the same. Still can't reach even 90% acc.

Any idea or suggestions about this problem? Thanks for your time.

cwfxcz avatar Mar 23 '18 02:03 cwfxcz

Thank you for your comprehensive experiments. Indeed, I also cannot reproduce the reported results with my implementation even though using the average of gradients. So far, I'm also still investigating the reason. If you find something, please let me know.

moskomule avatar Mar 23 '18 03:03 moskomule

Hi, some questions about the Algorithm 2 code. In the Shampoo paper, for different dimension it use the original gradselection_132 to calculate the contractionselection_131.

But in the code, the grad will be updated for each dimension, and then used to calculate the contraction for the next dimension. Is it sth wrong of my understanding about the code or the algo.2 in the paper? .

cwfxcz avatar Mar 26 '18 03:03 cwfxcz