lstm_ctc_ocr
lstm_ctc_ocr copied to clipboard
./test.sh error
2018-09-01 09:05:39.535363: W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: sequence_length(0) <= 29
Traceback (most recent call last):
File "./lstm/test_net.py", line 73, in
@ilovin 训练集精度上去了,测试集精度为0,能帮忙说一下原因吗?而且训练的时候验证集每次用的都是一样的数字。也不是我验证集里面的数据。
测试的时候输出是啥
4_31,759.jpg 15,2 6_3.jpg 78 7_996.jpg 7899 9_732.jpg 7,9 12_40,986.jpg 180,8 16_38,854.jpg 1,8606 26_0.jpg 740 36_0.jpg 780 37_0.jpg 780 43_367.jpg 7,6 45_7.jpg 74 47_10.jpg 780 49_0.jpg 70 55_14,950.jpg 149,8 56_0.jpg 7,0 60_0.jpg 7,0 65_0.jpg 747 67_0.jpg 746 这是部分输出结果,下划线后面是标签,我训练的是十个数字加一个逗号,共11个字符,图片尺寸(160,60)时一直不收敛(迭代十几万次),调整到(120,45)很快收敛了,信息如下: iter: 100 / 1000000, total loss: 13.0400743, lr: 0.0001000 speed: 0.102s / iter iter: 200 / 1000000, total loss: 10.5619059, lr: 0.0001000 speed: 0.117s / iter iter: 300 / 1000000, total loss: 7.7125740, lr: 0.0001000 speed: 0.119s / iter iter: 400 / 1000000, total loss: 2.0175159, lr: 0.0001000 speed: 0.111s / iter iter: 500 / 1000000, total loss: 0.5158803, lr: 0.0001000 speed: 0.118s / iter iter: 600 / 1000000, total loss: 0.3201181, lr: 0.0001000 speed: 0.115s / iter iter: 700 / 1000000, total loss: 0.1470777, lr: 0.0001000 speed: 0.113s / iter iter: 800 / 1000000, total loss: 0.4072588, lr: 0.0001000 speed: 0.115s / iter iter: 900 / 1000000, total loss: 0.2619937, lr: 0.0001000 speed: 0.117s / iter seq 0: origin: [8, 1, 6, 11] decoded:[8, 1, 6, 11] seq 1: origin: [8, 7, 1, 10, 3, 4, 10, 5] decoded:[8, 7, 1, 10, 3, 4, 10, 5] seq 2: origin: [2, 2, 4, 2, 3] decoded:[2, 2, 4, 2, 3] seq 3: origin: [4, 11] decoded:[4, 11] seq 4: origin: [3] decoded:[3] accuracy: 0.98438 iter: 1000 / 1000000, total loss: 0.0657427, lr: 0.0001000 speed: 0.127s / iter iter: 1100 / 1000000, total loss: 0.0259138, lr: 0.0001000 speed: 0.119s / iter iter: 1200 / 1000000, total loss: 0.0610742, lr: 0.0001000 speed: 0.113s / iter iter: 1300 / 1000000, total loss: 0.0725027, lr: 0.0001000 speed: 0.109s / iter ('loss: ', 0.014755567) Wrote snapshot to: /home/liu/lstm_ctc_ocr-beta/output/lstm_ctc/lstm_ctc_iter_2.ckpt seq 0: origin: [8, 1, 6, 11] decoded:[8, 1, 6, 11] seq 1: origin: [8, 7, 1, 10, 3, 4, 10, 5] decoded:[8, 7, 1, 10, 3, 4, 10, 5] seq 2: origin: [2, 2, 4, 2, 3] decoded:[2, 2, 4, 2, 3] seq 3: origin: [4, 11] decoded:[4, 11] seq 4: origin: [3] decoded:[3] accuracy: 1.00000 iter: 1400 / 1000000, total loss: 0.1878222, lr: 0.0001000 speed: 0.113s / iter iter: 1500 / 1000000, total loss: 0.0301566, lr: 0.0001000 speed: 0.117s / iter ('loss: ', 0.0140728075) Wrote snapshot to: /home/liu/lstm_ctc_ocr-beta/output/lstm_ctc/lstm_ctc_iter_2.ckpt seq 0: origin: [8, 1, 6, 11] decoded:[8, 1, 6, 11] seq 1: origin: [8, 7, 1, 10, 3, 4, 10, 5] decoded:[8, 7, 1, 10, 3, 4, 10, 5] seq 2: origin: [2, 2, 4, 2, 3] decoded:[2, 2, 4, 2, 3] seq 3: origin: [4, 11] decoded:[4, 11] seq 4: origin: [3] decoded:[3] accuracy: 1.00000 iter: 1600 / 1000000, total loss: 0.2451135, lr: 0.0001000 speed: 0.113s / iter ('loss: ', 0.01386846) 但是测试结果就如上所示了,所以麻烦问一下是什么情况,万分感谢。 PS: seq 0: origin: [8, 1, 6, 11] decoded:[8, 1, 6, 11] seq 1: origin: [8, 7, 1, 10, 3, 4, 10, 5] decoded:[8, 7, 1, 10, 3, 4, 10, 5] seq 2: origin: [2, 2, 4, 2, 3] decoded:[2, 2, 4, 2, 3] seq 3: origin: [4, 11] decoded:[4, 11] seq 4: origin: [3] decoded:[3] 这些数据不是我val里面的数据标签,不知道读的哪里的。
你测试数据和训练数据是同分布吗,都是reptcha生成的?
数据不是生成的,是我自己的数据,一批数据随机挑出2000测试集,剩下30000多是训练集和验证集。
我觉得有可能是你测试的config没够改 测试时的图像和训练没有同分布
测试集数据跟训练集分布是一样的,用另外的代码跑这些数据集都正常。我觉得还是图片尺寸和网络之间的问题,(160,60)迭代十几万次都不收敛,改成(120,45)的迭代几轮就收敛了可能不是正常收敛,我再仔细研究一下,找到错误原因后再来给大家分享吧。有遇到类似情况的朋友希望分享下解决方案。