[Chapter 17] Question : Does an autoencoder based on advanced CNN such as SE-ResNET make any sense ?
Hi,
I was going to work on exercise 9 from Chapter 17 (denoising autoencoder), and wanted to try using the best classifier I had trained so far on MNIST digits which is an SE-ResNET, as a basis for the encoder.
Here are my questions :
- Does it make any sense trying to do an encoder using such complex NN as a basis ? I would, of course, change the last layer of the encoder with a wider
Denselayer with relu activation (instead of softmax). I just fear that this kind of classifier NN tends to remove essential data for further decoding steps ... ? - Is it viable or counterproductive to simplify the decoder with a simple
Dense/Conv2DTransposecombination (i.e. having the autoencoder non-symmetrical) ? or do I need to implement a reverse-ish the SE-ResNet for the decoder ?
Thanks for your answers/insights.
After posting this question I found two part-answers but without explanation on why is it good or bad :
- ResNet used as an encoder here : https://towardsdatascience.com/u-nets-with-resnet-encoders-and-cross-connections-d8ba94125a2c
- I also found that thread in reddit, but does not seem clear about the result of that kind of autoencoder implementation : https://www.reddit.com/r/MachineLearning/comments/54tdww/autoencoders_using_residual_networks/
Just so you know, with a Dense only decoder below,
denoising_decoder = keras.models.Sequential([
keras.layers.Dense(100, activation="selu", input_shape=[30]),
keras.layers.Dense(28 * 28, activation="sigmoid"),
keras.layers.Reshape([28, 28])
])
I get this not so bad(?) result :

It seems that using Conv2DTransposed does not make better results than FC Dense decoder. How it can be explained ?
denoising_decoder = keras.models.Sequential([
keras.layers.Dense(14*14, activation="relu", input_shape=[30]),
keras.layers.Reshape([14, 14, 1]),
keras.layers.Conv2DTranspose(filters=1, kernel_size=3, strides=2,
padding="same", activation="sigmoid")
])

Thanks for your interesting question.
The optimal model always depends on the dataset (that's basically the conclusion of the "no free lunch" theorem). For example, there will be cases where a more powerful model will help (e.g., typically when the images are complex and you have a lot of training data), and others where a simpler model will be better. For example, MNIST is so simple that a Dense network can work just as well (or sometimes even better) than a more complex model.
So it's hard to make general statements, as the answer is usually empirical: "give it a try, and you will see".
That said, there are some general rules that tend to work quite often. For example, ConvNets generally work much better than Dense networks for images. Indeed, ConvNet makes some implicit assumptions about the images, such as the fact that neighboring pixels are more correlated than distant pixels, and these assumptions usual hold in real life images.
So, to answer your question, I believe it should be quite possible to make a good autoencoder using a SE-ResNET as the encoder. Of course be careful to avoid having skip connections skip the bottleneck layer (if you are building an auto-encoder with a bottleneck). But the final performance will really depend on the dataset.
Hope this answers your question.
Yes, That answers my question. Thank you.
I was mainly interested in the impact on the decoder part only (due to a SE-ResNet-like encoder), but your answer applies also. More generally, I was interested in the impact of having a asymmetric autoencoder (with a simpler decoder than encoder), and if there have been some 'proven' work if there is a general impact or not. If you know some published work about that, I would be interested. I did not found much publication nor factful results on this topic, only some experiments and opinions : https://www.reddit.com/r/MachineLearning/comments/ef1xe8/d_should_autoencoders_really_be_symmetric/ and strangely only 3 arxiv papers with these keywords in the title :
Thanks for your feedback and the interesting ideas and links. Sorry, I'm not aware of additional work on asymmetric autoencoders: it seems like an interesting thing to dig into!