keras_ocr
keras_ocr copied to clipboard
What are those output shapes of lambda_1 & blstm layers in ctpn?
In the ctpn model, I've replaced the bidirectional unit GRU with LSTM.
x2 = Bidirectional(LSTM(128,return_sequences=True), name='blstm')(x1)
But the shape of blstm really confused me. Take an image as input, let say its shape is [512,512,3]. Then the output shape of rpn_conv1 is [None,32,32,512] and the output shape of lambda_2 is [None,32,32,256]. So what are the shapes of lambda_1 and blstm?
rpn_conv1 (Conv2D) (None, 32, 32, 512)
-----------------------------------------------------
lambda_1 (Lambda) (None, None, 512)
-----------------------------------------------------
blstm (Bidirectional) (None, None, 256)
-----------------------------------------------------
lambda_2 (Lambda) (None, 32, 32, 256)
Below are source codes of reshape and reshape2. From my suspection, since batch size is 1, rpn_conv1.output.shape is [1,32,32,512], after reshape function, lambda_1.output.shape is [32,32,512]. Then blstm.output.shape is [32,32,256] and lambda_2.output.shape is [1,32,32,256].
def reshape(x):
import tensorflow as tf
b = tf.shape(x)
x = tf.reshape(x,[b[0]*b[1],b[2],b[3]])
return x
def reshape2(x):
import tensorflow as tf
x1,x2 = x
b = tf.shape(x2)
x = tf.reshape(x1,[b[0],b[1],b[2],256])
return x
However, when I try to replace two lambda layers with keras built in reshape layers, I got error:
ValueError: total size of new array must be unchanged
which suggests me that the total size after lambda_1 is different from 1x32x32x512, thus reshape.output.shape is not [32,32,512]. But this result is confict with what I know from the source code of reshape function. Would you please tell me the actual output shapes of lambda_1 and blstm layers? Thanks a lot.
eg.
rpn_conv1 is [batchsize=1,32,32,512]
lambda_1 (Lambda) = (132,32,512)
Bidirectional(LSTM(128,return_sequences=True), =128 +128 = 256 =>(132,32,256)
lambda_2 (Lambda) = (1,32,32,256)
@xiaomaxiao Thank you for your reply. I think what you said is like below right?
rpn_conv1 is [batchsize=1,32,32,512]
lambda_1 (Lambda) = (1x32,32,512)
Bidirectional(LSTM(128,return_sequences=True), =128 +128 = 256 =>(1x32,32,256)
lambda_2 (Lambda) = (1,32,32,256)
I've modified the model to something like below and also get good results when testing on images. But I do not know the reason behind it.
rpn_conv1 is [batchsize=1,32,32,512]
lambda_1 (Lambda) = (1,32x32,512)
Bidirectional(LSTM(128,return_sequences=True), =128 +128 = 256 =>(1,32x32,256)
lambda_2 (Lambda) = (1,32,32,256)