keras_ocr icon indicating copy to clipboard operation
keras_ocr copied to clipboard

What are those output shapes of lambda_1 & blstm layers in ctpn?

Open tianranwangcs opened this issue 6 years ago • 2 comments

In the ctpn model, I've replaced the bidirectional unit GRU with LSTM.

x2 = Bidirectional(LSTM(128,return_sequences=True), name='blstm')(x1)

But the shape of blstm really confused me. Take an image as input, let say its shape is [512,512,3]. Then the output shape of rpn_conv1 is [None,32,32,512] and the output shape of lambda_2 is [None,32,32,256]. So what are the shapes of lambda_1 and blstm?

rpn_conv1 (Conv2D)              (None, 32, 32, 512)
-----------------------------------------------------
lambda_1 (Lambda)               (None, None, 512)
-----------------------------------------------------
blstm (Bidirectional)           (None, None, 256)
-----------------------------------------------------
lambda_2 (Lambda)               (None, 32, 32, 256)

Below are source codes of reshape and reshape2. From my suspection, since batch size is 1, rpn_conv1.output.shape is [1,32,32,512], after reshape function, lambda_1.output.shape is [32,32,512]. Then blstm.output.shape is [32,32,256] and lambda_2.output.shape is [1,32,32,256].

def reshape(x):
    import tensorflow as tf 
    b = tf.shape(x)
    x = tf.reshape(x,[b[0]*b[1],b[2],b[3]])
    return x


def reshape2(x):
    import tensorflow as tf 
    x1,x2 = x
    b = tf.shape(x2)
    x = tf.reshape(x1,[b[0],b[1],b[2],256])
    return x 

However, when I try to replace two lambda layers with keras built in reshape layers, I got error:

ValueError: total size of new array must be unchanged

which suggests me that the total size after lambda_1 is different from 1x32x32x512, thus reshape.output.shape is not [32,32,512]. But this result is confict with what I know from the source code of reshape function. Would you please tell me the actual output shapes of lambda_1 and blstm layers? Thanks a lot.

tianranwangcs avatar Jul 06 '18 13:07 tianranwangcs

eg. rpn_conv1 is [batchsize=1,32,32,512]
lambda_1 (Lambda) = (132,32,512) Bidirectional(LSTM(128,return_sequences=True), =128 +128 = 256 =>(132,32,256) lambda_2 (Lambda) = (1,32,32,256)

xiaomaxiao avatar Jul 09 '18 02:07 xiaomaxiao

@xiaomaxiao Thank you for your reply. I think what you said is like below right?

rpn_conv1 is [batchsize=1,32,32,512]
lambda_1 (Lambda) = (1x32,32,512)
Bidirectional(LSTM(128,return_sequences=True), =128 +128 = 256 =>(1x32,32,256)
lambda_2 (Lambda) = (1,32,32,256)

I've modified the model to something like below and also get good results when testing on images. But I do not know the reason behind it.

rpn_conv1 is [batchsize=1,32,32,512]
lambda_1 (Lambda) = (1,32x32,512)
Bidirectional(LSTM(128,return_sequences=True), =128 +128 = 256 =>(1,32x32,256)
lambda_2 (Lambda) = (1,32,32,256)

tianranwangcs avatar Jul 09 '18 05:07 tianranwangcs