All-Conv-Keras icon indicating copy to clipboard operation
All-Conv-Keras copied to clipboard

Accuracy is ~80 after 350 epochs

Open ChesterAiGo opened this issue 7 years ago • 5 comments

hi vibrantabhi19 :

Thank you for sharing your code! That's very helpful for me to understand All-CNN.

In addition, I've trained it last with your model night with 350 epochs, however found its accuracy (i.e. val_acc) became stable (about 0.81) after epoch 49 and remained the same to the end

Any ideas? :) 👍

The model I used:

` model = Sequential()

model.add(Conv2D(96, (3, 3), padding="same", input_shape=(32, 32, 3)))
model.add(Activation('relu'))
model.add(Conv2D(96, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(96, (3, 3), padding="same", strides=2))
model.add(Dropout(0.5))

model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (3, 3), padding="same", strides=2))
model.add(Dropout(0.5))

model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (1, 1), padding="valid"))
model.add(Activation('relu'))
model.add(Conv2D(10, (1, 1), padding="valid"))

model.add(GlobalAveragePooling2D())
model.add(Activation('softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])`

ChesterAiGo avatar Aug 04 '17 00:08 ChesterAiGo

Hi @ChesterAiGo Thanks. As far as I can tell, you should try with a different set of learning parameter, maybe try Adam as your optimizer because the network is not able to converge. Also in the original paper scheduler S = "e1 ,e2 , e3" were used in which γ is multiplied by a fixed multiplier of 0.1 after e1. e2 and e3 epochs respectively. (where e1 = 200, e2 = 250, e3 = 300). Maybe you can have a go at that. What's your training_accuracy? A measure of training accuracy might ensure that the model is not overfitting.

iabhi7 avatar Aug 04 '17 07:08 iabhi7

Hi @vibrantabhi19

Thanks for your prompt reply! I will have a try of different optimizers as well as try vary γ during training(I think that's probably why)

In addition, there was something very interesting about the accuracies..i.e. the training accuracy keeps increasing steadily (from epoch 1 to epoch 350) while the validation accuracy became stable (was not increasing but was not decreasing as well..that's weird xD) after epoch 49..

Something looks like:

Epoch 1: Val: 0.1, Train: 0.1 ... Epoch 49: Val: 0.8, Train: 0.8 ... Epoch 450: Val: 0.8, Train: 0.94

Thanks again ! :)

ChesterAiGo avatar Aug 04 '17 09:08 ChesterAiGo

Oh, that's weird, the network cannot overfit, we are already using a dropout of 0.5. Since the network is converging (train_acc=0.94 is a proof of that), I don't think trying out different optimizers will help, anyways go ahead with the experiment and post your results here. I will try investigating on my end (the same code has worked for a lot people so I am not able to figure the exact error)

iabhi7 avatar Aug 04 '17 10:08 iabhi7

I can confirm, that using the original code (with the fix in https://github.com/MateLabs/All-Conv-Keras/pull/5) and removal of multi_gpu code reveals an accuracy over 81%. My best after 350 epochs using the code of this repository was 90.88%. However, it cracked 90% already in epoch 140.

See accuracy (as CSV):

screen shot 2017-09-23 at 21 48 16

And loss (as CSV):

screen shot 2017-09-23 at 21 48 30

The learning rate decay produced this (as CSV): screen shot 2017-09-23 at 21 49 23

See also full console log.

and all source code + weights here: https://aetros.com/marcj/keras:all-conv/view/refs/aetros/job/92fcd671c6814c375edd404a65edc66c00ba5aec or in the analytics tool at https://trainer.aetros.com/model/marcj/keras:all-conv/job/92fcd671c6814c375edd404a65edc66c00ba5aec (requires login first)

Hyper parameter and other information here:

screen shot 2017-09-23 at 21 51 37

So what I can say: I can not reproduce the stuck at 81%. @ChesterAiGo, you can fork my model at https://aetros.com/marcj/keras:all-conv and try to run it on your hardware, so we have all information to debug it.

However, I'd also like to know why this code does not produce the results from the linked paper and what you need concretely to achieve 95.59% for cifar10 using all-conv.

marcj avatar Sep 23 '17 19:09 marcj

this is some sexy plots 90 percent accuracy

JaeDukSeo avatar May 28 '19 16:05 JaeDukSeo