Keras-NASNet icon indicating copy to clipboard operation
Keras-NASNet copied to clipboard

Network is not learning :(

Open Rabia-Metis opened this issue 7 years ago • 31 comments

Thanks for the amazing work I am having issue regarding network learning. My NASNet model ain't learning. Training accuracy is improving but validation accuracy isn't changing and stuck at 0.4194. Training data=600 imgs, testing data=62 imgs Image shape= (224,224,3) Epochs=10-15

image

Rabia-Metis avatar Dec 25 '17 12:12 Rabia-Metis

Have you tried a simpler model and seen if it is able to learn on your dataset ? Are the train and validation images from the same datasets ?

titu1994 avatar Dec 25 '17 16:12 titu1994

Yes, I have tried Resnet-152, SEnet and achieved average accuracy above 80% Yes, both are from the same dataset

Rabia-Metis avatar Dec 26 '17 08:12 Rabia-Metis

Interesting. What type if preprocessing are you using ? Mean std or -1 to 1? Also, are you using regularization default value? Try setting it to 0

titu1994 avatar Dec 26 '17 10:12 titu1994

I tried with (subtracting mean and dividing by SD) and without pre-processing both. I also tried setting weight_decay=0 as you said but it's not making any difference:

model = NASNet(classes=2,input_shape=(224, 224, 3), weights=None, penultimate_filters=4032, nb_blocks=6,use_auxiliary_branch=True, skip_reduction=False, weight_decay=0)

You can view the logs here https://www.floydhub.com/ptanikon2/projects/n-net/3

Rabia-Metis avatar Dec 26 '17 14:12 Rabia-Metis

Why not try fine tuning one of the pre trained NASNet blocks rather than train from scratch? Since you seem to have significant computation available, I suggest using NASNet Mobile as a base and add on the layers to make your final classifier.

titu1994 avatar Dec 26 '17 14:12 titu1994

Also, use the -1 to 1 preprocessing for NASNets, and especially when using pretrained weights for fine-tuning.

titu1994 avatar Dec 26 '17 14:12 titu1994

Okay but weights are not available for NASNET_LARGE_WEIGHT_PATH_WITH_auxiliary_NO_TOP = "https://github.com/titu1994/Keras-NASNet/releases/download/v1.1/NASNet-auxiliary-large-no-top.h5" And when I don't use auxiliary branch and use NASNet-large-no-top.h5 it gives me the following error:

image

Rabia-Metis avatar Dec 27 '17 07:12 Rabia-Metis

Use NASNet Mobile, not large. Weights for all models are available. Refer to Keras blogpost on Fine-tuning to see how to use no-top models for training.

titu1994 avatar Dec 27 '17 08:12 titu1994

I have tried NASNet mobile but its results are also not good (attached). Kindly suggest what else I can try? Secondly there is a typo in auxiliary weights path due to which weights were not loading earlier. Change 'auxiliary' to 'auxilary' in weights path.

image

Rabia-Metis avatar Jan 17 '18 11:01 Rabia-Metis

Did you find solution for you problem? I've encountered a similar problem and here is my post on stackoverflow

maystroh avatar Jan 23 '18 10:01 maystroh

@maystroh No, not yet. Let me know too if you find any solution

Rabia-Metis avatar Jan 25 '18 18:01 Rabia-Metis

Have you tried training it with the auxiliary loss for both the mobile and large models ? It is a strong regularizer which is required for training NASNet-A Model from scratch (not as important when just fine-tuning the Dense layers that you add).

titu1994 avatar Jan 25 '18 18:01 titu1994

Yes @titu1994 , I have tried auxiliary branch with both but it's also not making any difference

Rabia-Metis avatar Jan 31 '18 18:01 Rabia-Metis

I don't really have an answer then. It's either a problem with the regularizer strength, or with the auxiliary branch not working, or perhaps the model is too large so it's overfitting or something.

The fact that the mobile version doesnt do better either points to some other problem than the above though. Can you give more information of what the dataset is about, number of samples, size of images, task that you are performing (classification or regression or boundary box regression), and more information.

titu1994 avatar Jan 31 '18 19:01 titu1994

@titu1994 I experienced the same issue. You can see the training history in the middle of my Jupyter notebook.

Agent007 avatar Jan 31 '18 23:01 Agent007

@Agent007 A 87 Million parameter model for a dataset of 6600 images. I would say that is cause for concern for any large model, not just NASNet. Use the NASNet Mobile version and see if that reduces the error rate.

With your validation and test set scores being so low compared to training, I think there is something wrong with the data itself.

In cases from this thread, the discrepancy isn't that vast. The reduced performance is a concern, but your case is different.

titu1994 avatar Jan 31 '18 23:01 titu1994

My network learned from data. I used CIFAR 768 model but

  1. It takes 5 minutes per epoch to train on a P600
  2. My results were no where near those reported in the paper. I got about 92.7% on the test set...wonder why there would be such a dramatic difference.

pGit1 avatar Feb 14 '18 18:02 pGit1

I was reading through the paper again. They used a lots of tricks to get such high performance. Just skimming through a few :

  • Use Cut-out regularization for Cifar
  • Use DropPath regularization
  • Use auxiliary branch
  • Resize CIFAR to 40x40 and then take random crops
  • Cosine Annealing learning rate (though I think this was only during architecture search)
  • maybe something else i missed.

I think the two major regularizers are the DropPath and the auxiliary branch. Auxiliary branches are significant regularizers and you don't usually place such a weird branch to a model unless it somewhat significantly impacts the learning process.

Also, Keras in general does not seem to be able to exactly match the performance of PyTorch or base Tensorflow models (even when using the same backend as Tensorflow) for some reason. Obvious reasons are random initialization is different and that's fine, but to have 2-4% difference is a little weird. I try and match the papers almost word for word in Keras as to the decay and initializers and bias and everything, but it always seems to be 2-4% less than what the paper claimed.

Till date, I have never been able to get a basic ResNet 50 to the same level of performance as claimed in the original paper in Keras. TF manages it close enough though (with a 0.035% absolute difference, which is probably due to random initialization).

titu1994 avatar Feb 15 '18 19:02 titu1994

That is weird. Not sure why that would be. In my example I used Cutout and some other augmentation techniques and was that far off. If their mode is that dependent on an annealed schedule and the auxillary branch then I wouldnt say there is anything special about the network it self.

Also what is the topology of an auxillary branch? I read an updated paper that used NasNet and all it talkbed about was Normal and Reduction cells. It would be cool to see a plot_model or something similar to get a feel for what it is actually doing.

Thanks for insights by the way.

pGit1 avatar Feb 16 '18 15:02 pGit1

image

Having a hard time interpreting this cloud with the dots in the middle of it and I cant find what it means in the paper. As a result I dont understand what the difference between h_sub_i and h_sub_i_minus1 is. Is the cloud performing some type of operation?

pGit1 avatar Feb 17 '18 02:02 pGit1

@pGit1 It's meant to show how multiple layers of Normal Cells should be connected when stacked on top of each other. In this case, there are skip connections from the input of the previous Normal Cell to the convolutions within the current Normal Cell.

Agent007 avatar Feb 17 '18 02:02 Agent007

@Agent007 I am not sure I am following. So h_sub_i is the concatenated output of h_sub_i-1 and the dotted lines represents the concatenated output of h_sub_i-2??

pGit1 avatar Feb 17 '18 05:02 pGit1

H_i is the output of your current cell. H_i-1 is the input to your current cell. H_i-2 is the input to the cell prior to your current cell.

titu1994 avatar Feb 17 '18 06:02 titu1994

@titu1994 what layer would you freeze from for finetuning imagenet weights with a (200000,224,224,3) dataset in 128 classes?

alkari avatar Apr 23 '18 05:04 alkari

@alkari I'd suggest holding off on using the Keras version of NASNet for fine-tuning purposes for the moment. There have been mumrioke independent reports that suggest the weight loading mechanism was not perfect.

You can try fine-tuning the tensorflow models repository directly, for best results.

titu1994 avatar Apr 23 '18 06:04 titu1994

Thanks @titu1994, wouldn't that be resolved by freezing deep enough into the network tho? Which is why I was wondering if there's an optimal layer to freeze from. Here's where I have it:

model.trainable = True

set_trainable = False
for layer in model.layers:
  if layer.name == 'activation_253':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False
  print("layer {} is {}".format(layer.name, '+++trainable' if layer.trainable else '---frozen'))

Current results so far in training without augmentation:

Epoch 48/378 100/100 [==============================] - 538s 5s/step - loss: 2.9667 - predictions_loss: 0.7452 - aux_predictions_loss: 0.1686 - predictions_acc: 0.7764 - aux_predictions_acc: 0.9573 - val_loss: 4.2562 - val_predictions_loss: 1.5506 - val_aux_predictions_loss: 1.3832 - val_predictions_acc: 0.5697 - val_aux_predictions_acc: 0.6290

alkari avatar Apr 23 '18 15:04 alkari

Usually all layers will the last convolutions layer are frozen and then you would add a new classifier to the end. I dunno how, but it's learning something. I'm guessing it's cause even though the weights aren't working, the trainable portion of the network is properly learning something useful.

Try setting a small baseline with a small batchnorm vgg network first

titu1994 avatar Apr 23 '18 15:04 titu1994

is weight loading completely fixed after the latest commit? @titu1994

kunwar31 avatar May 30 '18 23:05 kunwar31

Yes. Though I haven't tested the auxiliary branches yet, but the weights have been ported. If they still don't work, it's gonna be a problem.

titu1994 avatar May 30 '18 23:05 titu1994

It seems like overfitting.Try an easy model.

DongfeiJi avatar Jun 25 '19 04:06 DongfeiJi