text-detection-ctpn icon indicating copy to clipboard operation
text-detection-ctpn copied to clipboard

WARNING:tensorflow:Variable Conv/weights missing in checkpoint data/vgg_16.ckpt (new version)

Open hcnhatnam opened this issue 6 years ago • 12 comments

When I try to execute the main/train.py in new version code and receiving this error /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " WARNING:tensorflow:Variable Conv/weights missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable Conv/biases missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable BiLSTM/bidirectional_rnn/fw/lstm_cell/kernel missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable BiLSTM/bidirectional_rnn/fw/lstm_cell/bias missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable BiLSTM/bidirectional_rnn/bw/lstm_cell/kernel missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable BiLSTM/bidirectional_rnn/bw/lstm_cell/bias missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable BiLSTM/weights missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable BiLSTM/biases missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable bbox_pred/weights missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable bbox_pred/biases missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable cls_pred/weights missing in checkpoint data/vgg_16.ckpt WARNING:tensorflow:Variable cls_pred/biases missing in checkpoint data/vgg_16.ckpt 2019-01-13 14:04:34.044515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-01-13 14:04:34.044907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:04.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-01-13 14:04:34.044943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-01-13 14:04:34.354396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-01-13 14:04:34.354475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-01-13 14:04:34.354491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-01-13 14:04:34.354732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10869 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7) Find 3422 images Find 3422 images 3422 training images in data/dataset/mlt/ 3422 training images in data/dataset/mlt/ Find 3422 images 3422 training images in data/dataset/mlt/ Find 3422 images 3422 training images in data/dataset/mlt

I have to wait for more than 3 hours but still there("3422 training images in data/dataset/mlt").Can someone please help me with this ?The reason is WARNING:tensorflow ...?

hcnhatnam avatar Jan 13 '19 15:01 hcnhatnam

I have the same problem. Waiting for any solution

NamNguyenThanh avatar Jan 13 '19 16:01 NamNguyenThanh

@hcnhatnam @NamNguyenThanh hi, the warning can be ignored, since vgg is only used as a pretrained model, and there remains some useless parameters. for the second problem, the training process did not start. check the dataset path. may be caused by utils/dataset/data_provider.py line 55 and line 57. you should check if the data has been generated correctly

eragonruan avatar Jan 14 '19 07:01 eragonruan

Thank you @eragonruan. I resolved the problem.

hcnhatnam avatar Jan 14 '19 12:01 hcnhatnam

Hi @eragonruan I am trying to run the new ctpn, but when I run the to generate the data, it shows me "too many values to unpack (expected 4)". My dataset is in the following format: x1,y1,x2,y2,x3,y3,x4,y4,txt_transcription Should I change the dataset into 4 rectangular values?

Thanks

guddulrk avatar Jan 15 '19 05:01 guddulrk

line = line.strip().split(",") x_min, y_min, x_max, y_max = map(int, line) bbox.append([x_min, y_min, x_max, y_max, 1])

does x_min = x1, y_min = y1, x_max = x2, and y_max = y2 ?

Thanks

guddulrk avatar Jan 15 '19 06:01 guddulrk

@guddulrk hi, you should run split_label.py first to prepare the dataset. the input of split_label.py is x1,y1,x2,y2,x3,y3,x4,y4. and the output is text segment labelled in xmin,ymin,xmax,ymax. Then you can start the training process

eragonruan avatar Jan 15 '19 07:01 eragonruan

@hcnhatnam @NamNguyenThanh hi, the warning can be ignored, since vgg is only used as a pretrained model, and there remains some useless parameters. for the second problem, the training process did not start. check the dataset path. may be caused by utils/dataset/data_provider.py line 55 and line 57. you should check if the data has been generated correctly

@eragonruan I think the CPU users who dont have much processing power should reduce the save_checkpoint_steps in train.py file at line 24. Slow processor takes more time to execute one step.So reducing the checkpoint steps will make it easy to see it train instead of waiting for minutes/hrs.Hope people find this helpful!

atal-manuja avatar Jan 30 '19 13:01 atal-manuja

Thank you @eragonruan. I resolved the problem. Could you tell me how to solve this problem?@hcnhatnam

thograce avatar Apr 04 '19 15:04 thograce

I run the train.py but i got the issue , the checkpoint_mlt/ is download from the readme .who can help me

the error is belowed: Traceback (most recent call last): File "/home/simplify/OCR/work?_ctpn/main/train.py", line 118, in tf.app.run() File "/home/simplify/OCR/work?_ctpn/venv/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/simplify/OCR/work?ctpn/main/train.py", line 81, in main restore_step = int(ckpt.split('.')[0].split('')[-1]) ValueError: invalid literal for int() with base 10: ''

simplify23 avatar May 12 '19 03:05 simplify23

@simplify23 I have same problem。Could you tell me how to solve this problem?thank you。

qingqing625 avatar Oct 10 '19 03:10 qingqing625

hi please help : I run the train.py but i got the issue . restore_step = int(ckpt.split('.')[0].split('_')[-1]) ValueError: invalid literal for int() with base 10: ''

sevany avatar Apr 17 '20 03:04 sevany

@sevany change tf.app.flags.DEFINE_boolean('restore', True, '') to tf.app.flags.DEFINE_boolean('restore', False, '') in train.py file, if you're training from scratch.

prabhakar-sivanesan avatar Aug 03 '20 20:08 prabhakar-sivanesan