Autoencoder icon indicating copy to clipboard operation
Autoencoder copied to clipboard

Leaking data in the max-pool indices

Open efirdc opened this issue 5 years ago • 3 comments
trafficstars

One RGBA pixel is 32 bits, so a 2x2 of pixels is 128 bits

Each max pool index stores 2 bits of data. The first convolutional block has 64 channels. So there are 2*64 = 128 bits of data in the max pooling indices for that block. Those get passed straight to the end of the network.

efirdc avatar Nov 13 '20 18:11 efirdc

@efirdc can you please elaborate, I'm trying to fix and use this model. And BTW the input/output of each layer (up and down) makes sense to you?

ramidzamzam avatar Dec 02 '20 16:12 ramidzamzam

No unfortunately I would not recommend using this architecture. It is built more like a segmentation architecture than an autoencoder. 99% of the model will not be used since the image is passed straight through the max-pooling indices at the highest resolution. This is why the reconstruction is almost completely perfect, as the image was never encoded.

If you removed the max pooling indices then the model most likely will not work still because there are too many layers for an encoder. The image signal will be lost and it will be impossible to optimize.

efirdc avatar Dec 02 '20 17:12 efirdc

@efirdc Yeah I was impressed by the quality of the output.

ramidzamzam avatar Dec 02 '20 18:12 ramidzamzam