[QUESTION] How exactly does auxiliary output in a NN help regularize
Hi In chapter 10, under "use cases for multiple outputs", the last point says that auxiliary outputs can help regularize.
The exact bullet point is - Another use case is as a regularization technique (i.e., a training constrain whose objective is to reduce overfitting and thus improve the model's ability to generalize). For example, you may want to add some auxiliary outputs in a NN architecture to ensure that the underlying part of the network learns something useful on its own, without relying on the rest of the network.
I just don't understand this. How exactly does it regularize? I detailed explanation would be appreciated! Thanks
Hi @Ruhil-DS ,
Suppose the neural net's main task is to classify images of cats and dogs. Just two classes. Perhaps in this particular training set, most dogs have a darker color than the cats, so that's all the neural net learns, since it doesn't need more. But in real life, it doesn't generalize well, of course.
Now suppose you add an auxiliary output that forces the neural net to reconstruct it's input. The neural net will be forced to preserve a lot of information in order to be able to reconstruct the input as best it can. But it will still need to compress that information since there's a limited number of neurons in the top hidden layer. With that information, the main output layer will be able to predict cat/dog based on more than just the brightness level. It may not perform as well on the training set, but it will probably generalize better to new images: that's regularization.
As an example of this technique, check out the original capsule networks paper by Sabour, Frost and Hinton (or see my video on this topic: https://youtu.be/pPN8d0E3900).
Hope this helps.