STN_CNN_LSTM_CTC_TensorFlow
STN_CNN_LSTM_CTC_TensorFlow copied to clipboard
about the STN loss
Is there any STN loss to constraint the transformation angle?I didn't find it in your code. If there is no one, how do the net know the how to transfer a picture?
@Fangction There is no stn loss,stn was supervised by ctc loss.However,if you have more supersied information,you can add constraint.For example,monitoring the coordinates regressed by the STN.
@wushilian thank you. And I have another question. In your code W = tf.Variable(tf.zeros([128, 20])) b = tf.Variable(initial_value=[-1, -0.2, -0.5, -0.35, 0, -0.5, 0.5, -0.67, 1, -0.8, -1, 0.8, -0.5, 0.65, 0, 0.5, 0.5, 0.33, 1, 0.2], dtype=tf.float32) # fc3_loc=tf.layers.dense(fc2_loc,20,activation=tf.nn.tanh,kernel_initializer=tf.zeros_initializer) # fc3_loc = slim.fully_connected(fc2_loc, 8, activation_fn=tf.nn.tanh, scope='fc3_loc') # spatial transformer fc3_loc = tf.nn.tanh(tf.matmul(fc2_loc, W) + b)#激活函数结果 loc = tf.reshape(fc3_loc, [-1, 10, 2])#将fc3_loc的结果按照-1 10 2的结构reshape # spatial transformer s = np.array([[-0.95, -0.95], [-0.5, -0.95], [0, -0.95], [0.5, -0.95], [0.95, -0.95], [-0.95, 0.95], [-0.5, 0.95], [0, 0.95], [0.5, 0.95], [0.95,0.95]] * 256) s = tf.constant(s.reshape([256, 10, 2]), dtype=tf.float32) how did you decide the value of variable b and s?
@Fangction b means Specific initialization,for details,you need to read the paper:Robust Scene Text Recognition with Automatic Rectification
@wushilian thank you again. And I also want to ask one more question. Do you think the STN's output image size must be fixed size?I see you define the output as image_width=120 image_height=32. Can i keep the size of output image same as the input image?
@Fangction Yes,you can change the size.
@Fangction Yes,you can change the size.
I change the size and it cause a error like "InvalidArgumentError (see above for traceback): len(seq_lens) != input.dims(0), (256 vs. 1536)"
What can I do to solve this error?