All-Conv-Keras
All-Conv-Keras copied to clipboard
Accuracy is ~80 after 350 epochs
hi vibrantabhi19 :
Thank you for sharing your code! That's very helpful for me to understand All-CNN.
In addition, I've trained it last with your model night with 350 epochs, however found its accuracy (i.e. val_acc) became stable (about 0.81) after epoch 49 and remained the same to the end
Any ideas? :) 👍
The model I used:
` model = Sequential()
model.add(Conv2D(96, (3, 3), padding="same", input_shape=(32, 32, 3)))
model.add(Activation('relu'))
model.add(Conv2D(96, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(96, (3, 3), padding="same", strides=2))
model.add(Dropout(0.5))
model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (3, 3), padding="same", strides=2))
model.add(Dropout(0.5))
model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (1, 1), padding="valid"))
model.add(Activation('relu'))
model.add(Conv2D(10, (1, 1), padding="valid"))
model.add(GlobalAveragePooling2D())
model.add(Activation('softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])`
Hi @ChesterAiGo
Thanks.
As far as I can tell, you should try with a different set of learning parameter, maybe try Adam as your optimizer because the network is not able to converge.
Also in the original paper scheduler S = "e1 ,e2 , e3" were used in which γ is multiplied by a fixed multiplier of 0.1 after e1. e2 and e3 epochs respectively. (where e1 = 200, e2 = 250, e3 = 300).
Maybe you can have a go at that.
What's your training_accuracy
? A measure of training accuracy might ensure that the model is not overfitting.
Hi @vibrantabhi19
Thanks for your prompt reply! I will have a try of different optimizers as well as try vary γ during training(I think that's probably why)
In addition, there was something very interesting about the accuracies..i.e. the training accuracy keeps increasing steadily (from epoch 1 to epoch 350) while the validation accuracy became stable (was not increasing but was not decreasing as well..that's weird xD) after epoch 49..
Something looks like:
Epoch 1: Val: 0.1, Train: 0.1 ... Epoch 49: Val: 0.8, Train: 0.8 ... Epoch 450: Val: 0.8, Train: 0.94
Thanks again ! :)
Oh, that's weird, the network cannot overfit, we are already using a dropout of 0.5.
Since the network is converging (train_acc=0.94
is a proof of that), I don't think trying out different optimizers will help, anyways go ahead with the experiment and post your results here.
I will try investigating on my end (the same code has worked for a lot people so I am not able to figure the exact error)
I can confirm, that using the original code (with the fix in https://github.com/MateLabs/All-Conv-Keras/pull/5) and removal of multi_gpu code reveals an accuracy over 81%. My best after 350 epochs using the code of this repository was 90.88%. However, it cracked 90% already in epoch 140.
See accuracy (as CSV):
And loss (as CSV):
The learning rate decay produced this (as CSV):
See also full console log.
and all source code + weights here: https://aetros.com/marcj/keras:all-conv/view/refs/aetros/job/92fcd671c6814c375edd404a65edc66c00ba5aec or in the analytics tool at https://trainer.aetros.com/model/marcj/keras:all-conv/job/92fcd671c6814c375edd404a65edc66c00ba5aec (requires login first)
Hyper parameter and other information here:
So what I can say: I can not reproduce the stuck at 81%. @ChesterAiGo, you can fork my model at https://aetros.com/marcj/keras:all-conv and try to run it on your hardware, so we have all information to debug it.
However, I'd also like to know why this code does not produce the results from the linked paper and what you need concretely to achieve 95.59% for cifar10 using all-conv.
this is some sexy plots 90 percent accuracy