Deep-Learning 1cycle Policy. Unfamiliar results

1cycle Policy. Unfamiliar results

Open karanchahal opened this issue 7 years ago • 1 comments

Hey,

I was implementing 1 cycle policy as an exercise. And I have a few observations from my experiments. I have a Model : Resnet18. Batch size for training = 128 Batch size for testing = 100

Optimser : optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4) Total number of epochs 26

1 cycle policy : Learning rate goes from 0.01 to 0.1 and back till 24 epochs

Then model is trained for 2 epochs at 0.001 learning rate.

No cyclic momentum used or adamw.

I achieved a test set accuracy of 93.4%in 26 epochs.

This seems like a big difference from the 70 epochs at 512 batch size that is quoted in your blog post.

Am I doing something wrong ? Is the number of epochs a good metric to base your results on, as those are dependant on the batch size ? .

The whole point of using super convergence is using high learning rates to converge quicker , but it seems like using low learning rates (0.01- 0.1 < 0.8-3) is faster to train.

Jul 24 '18 12:07 karanchahal

Sorry, I didn't this until now. The blog post you're referring too is a bit old now, and it was when we were just grasping with super-convergence. Now we can train to 94% accuracy in 30 epochs (see here) with 1cycle and AdamW.

Aug 17 '18 06:08 sgugger

Deep-Learning Deep-Learning copied to clipboard

1cycle Policy. Unfamiliar results

Deep-Learning
Deep-Learning copied to clipboard