FastMaskRCNN icon indicating copy to clipboard operation
FastMaskRCNN copied to clipboard

run train.py Input to reshape is a tensor with 1 values, but the requested shape has 0

Open ycui123 opened this issue 7 years ago • 22 comments

Hi everyone,

I got the problem:Input to reshape is a tensor with 1 values, but the requested shape has 0 while I was trainning the model. I run python train/train.py and the mistake happened in the middle of trainning. Sometimes it happens in iter 30+, sometimes it happens in iter 300+. I don't know how to fix it.

Does anyone have the same problem with me?

ycui123 avatar Jul 19 '17 19:07 ycui123

I think this issue is same as issue #88.

AlexGfocus avatar Jul 20 '17 00:07 AlexGfocus

I got this problem too, and i use version 1.1.0, the suggestions in issue #88 may not help

AihahaFox avatar Jul 21 '17 01:07 AihahaFox

But did you fix the problem ?

ycui123 avatar Jul 21 '17 14:07 ycui123

this my caused by writing a tfrecord example when the instance number is 0, try to escape that kind of image when writing tfRecord

LovPe avatar Aug 03 '17 07:08 LovPe

@LovPe,

What do you mean by instance number is 0?

On Thu, Aug 3, 2017 at 1:14 PM, LovPe [email protected] wrote:

this my caused by writing a tfrecord example when the instance number is 0, try to escape that kind of image when writing tfRecord

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CharlesShang/FastMaskRCNN/issues/113#issuecomment-319894353, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMgs46Bd9oo5Nub9hWM6GcdG3Ko8CNRks5sUXptgaJpZM4OdM0D .

-- Regards, Sharath Kumar R

The only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it.” - Steve Jobs (1955 - 2011)

Sharathnasa avatar Aug 03 '17 09:08 Sharathnasa

@Sharathnasa image


the code in red mark is the instance number, you can go into it to see the detail. i think the new version tensorflow(1.2+) will do some check when doing resizeing so when you have a example with instance number is zero, it will block the reading thread.

i add an if condition when writing tf record to make sure the instance number is >0: I rewrite the writing process so the details may be different and after this, the training project work well on tf1.3rc image

LovPe avatar Aug 04 '17 02:08 LovPe

@LovPe But i'm using tensorflow 1.1. Still facing the same issue. Shall go ahead with the condition to verify once?

On Fri, Aug 4, 2017 at 7:55 AM, LovPe [email protected] wrote:

@Sharathnasa https://github.com/sharathnasa [image: image] https://user-images.githubusercontent.com/11580882/28951542-4bd69af6-78fe-11e7-8f54-e60ffdd869ee.png

the code in red mark is the instance number, you can go into it to see the detail. i think the new version tensorflow(1.2+) will do some check when doing resizeing so when you have a example with instance number is zero, it will block the reading thread.

i add an if condition when writing tf record to make sure the instance number is >0: I rewrite the writing process so the details may be different and after this, the training project work well on tf1.3rc [image: image] https://user-images.githubusercontent.com/11580882/28951579-a5993e68-78fe-11e7-906f-d01bbcba6bf0.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CharlesShang/FastMaskRCNN/issues/113#issuecomment-320139507, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMgs-0DlMR6yM3r0f_lODObNMyYnFDvks5sUoEzgaJpZM4OdM0D .

-- Regards, Sharath Kumar R

The only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it.” - Steve Jobs (1955 - 2011)

Sharathnasa avatar Aug 04 '17 04:08 Sharathnasa

@Sharathnasa i did not use tensorflow1.1 but you can try it

LovPe avatar Aug 04 '17 05:08 LovPe

@LovPe May i know which line your code start from ? I met the same problem too. Your assistance is highly appreciated.

WKChung1028 avatar Sep 21 '17 03:09 WKChung1028

@WKChung1028 in my implementation: tf_rcnn->lib->datasets->convert_coco.py ,line 226 but i rewrite the code so i'm not sure the are match!

LovPe avatar Sep 21 '17 03:09 LovPe

@LovPe Do you mean you add the code right after this two line like this? mask = mask.astype(np.uint8) assert masks.shape[0] == gt_boxes.shape[0], 'Shape Error' show _rsesult =False break if get_boxes.shape[0]>0 img_raw = img.tostring() mask_raw = mask.tostring()

        example = _to_tfexample_coco_raw(
          	img_id,
          	img_raw,
          	mask_raw,
          	height, width, gt_boxes.shape[0],
          	gt_boxes.tostring(), masks.tostring())
        
        tfrecord_writer.write(example.SerializeToString())

WKChung1028 avatar Sep 21 '17 03:09 WKChung1028

something like that addif get_boxes.shape[0]>0 before implement _to_tfexample_coco_raw() function

LovPe avatar Sep 21 '17 04:09 LovPe

        img = img.astype(np.uint8)
        assert img.size == width * height * 3, '%s' % str(img_id)

        img_raw = img.tostring()
        mask_raw = mask.tostring()
        if gt_boxes.shape[0] > 0:
            example = _to_tfexample_coco_raw(
              img_id,
              img_raw,
              mask_raw,
              height, width, gt_boxes.shape[0],
              gt_boxes.tostring(), masks.tostring())
        
            tfrecord_writer.write(example.SerializeToString())

sys.stdout.write('\n') sys.stdout.flush()

i add the command like this and run good for few literation.

but it stopped again and showed error like below:

['background'] iter 126: image-id:0516249, time:14.586(sec), regular_loss: 0.247718, total-loss 0.6493(0.0921, 0.5463, 0.000000, 0.0109, 0.0000), instances: 11, batch:(26|114, 0|33, 0|0) labels [] classes ['background'] Traceback (most recent call last): File "train/train.py", line 339, in train() File "train/train.py", line 335, in train coord.join(threads) File "/home/ubuntu/Documents/WK/my_project/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/home/ubuntu/Documents/WK/my_project/local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run enqueue_callable() File "/home/ubuntu/Documents/WK/my_project/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1235, in _single_operation_run target_list_as_strings, status, None) File "/usr/lib/python2.7/contextlib.py", line 24, in exit self.gen.next() File "/home/ubuntu/Documents/WK/my_project/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape has 0 [[Node: Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](DecodeRaw_1, Reshape/shape)]]

WKChung1028 avatar Sep 21 '17 07:09 WKChung1028

@LovPe I am a beginner, hope to know more about the solution from you ,. Thank you.

WKChung1028 avatar Sep 21 '17 07:09 WKChung1028

@WKChung1028
1/make sure your tfrecord file was generated from new code 2/ i use python 3.5 with tf1.3

LovPe avatar Sep 21 '17 07:09 LovPe

  1. Which means i need to delete all tfrecord file generated by previous code in the record before i run the new code ?

  2. i am using tf 1.4 and python 2.7 in cpu .

WKChung1028 avatar Sep 21 '17 08:09 WKChung1028

1/yes 2/i think is all right

LovPe avatar Sep 21 '17 09:09 LovPe

Hi, i've looked into this bug. Actually bad string is coco.py: gt_boxes = tf.decode_raw(features['label/gt_boxes'], tf.float32) The problem is if we call tf.decode_raw from empty string('') it returns tensor [0] i.e. tf.decode_raw('', tf.float32).eval() == array([ 0.], dtype=float32) I think we should handle empty in special way string here. I'll try to make a fix

I am actually not sure about 'delete all tfrecord file generated by previous code' because actually by this you will delete all images without markup. They could still be useful for training.

anatolix avatar Oct 19 '17 14:10 anatolix

fix: https://github.com/CharlesShang/FastMaskRCNN/pull/160

anatolix avatar Oct 19 '17 15:10 anatolix

tfrecord file in use tf1.2 generate,but run use tf1.5, Will have this problem?

zzdgit avatar Mar 06 '18 02:03 zzdgit

@LovPe I just follow your advice and this problem has been partially solved. But the program still ended at iter124 and still had this problem. I want to know how to deal with it totally. I use tf1.4 and python2.7.

LiuPearl1 avatar Jun 12 '18 13:06 LiuPearl1

@LiuPearl1 i use python3.6 and tf1.2 when solving this problem, and currently i have already give up to use this project and work on the original implementation on caffe2. i found there are some details different between 2 projects.

LovPe avatar Jun 13 '18 14:06 LovPe