sightseq icon indicating copy to clipboard operation
sightseq copied to clipboard

The vanilla cnn downsampling architecture cannot recover spatial information of a image

Open zhiqwang opened this issue 5 years ago • 1 comments

The convolutional part of the architecture act as a encoder part, it capture image's contexture information, the architecture should ensemble a decoder part (deconvolution layer or RNN layer) to recover image's spatial information.

zhiqwang avatar Apr 15 '19 12:04 zhiqwang

The current CNN architecture implemented here is classified two categories by the pooling size of the image's width: the one is densenet121, it compress image's width by 1/8, the other one is densenet_cifar by 1/4. So the current network architecture cannot handle the situation where the text have different width.

zhiqwang avatar Apr 16 '19 02:04 zhiqwang