FastMaskRCNN
FastMaskRCNN copied to clipboard
run train.py Input to reshape is a tensor with 1 values, but the requested shape has 0
Hi everyone,
I got the problem:Input to reshape is a tensor with 1 values, but the requested shape has 0 while I was trainning the model. I run python train/train.py and the mistake happened in the middle of trainning. Sometimes it happens in iter 30+, sometimes it happens in iter 300+. I don't know how to fix it.
Does anyone have the same problem with me?
I think this issue is same as issue #88.
I got this problem too, and i use version 1.1.0, the suggestions in issue #88 may not help
But did you fix the problem ?
this my caused by writing a tfrecord example when the instance number is 0, try to escape that kind of image when writing tfRecord
@LovPe,
What do you mean by instance number is 0?
On Thu, Aug 3, 2017 at 1:14 PM, LovPe [email protected] wrote:
this my caused by writing a tfrecord example when the instance number is 0, try to escape that kind of image when writing tfRecord
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CharlesShang/FastMaskRCNN/issues/113#issuecomment-319894353, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMgs46Bd9oo5Nub9hWM6GcdG3Ko8CNRks5sUXptgaJpZM4OdM0D .
-- Regards, Sharath Kumar R
The only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it.” - Steve Jobs (1955 - 2011)
@Sharathnasa
the code in red mark is the instance number, you can go into it to see the detail. i think the new version tensorflow(1.2+) will do some check when doing resizeing so when you have a example with instance number is zero, it will block the reading thread.
i add an if condition when writing tf record to make sure the instance number is >0:
I rewrite the writing process so the details may be different and after this, the training project work
well on tf1.3rc
@LovPe But i'm using tensorflow 1.1. Still facing the same issue. Shall go ahead with the condition to verify once?
On Fri, Aug 4, 2017 at 7:55 AM, LovPe [email protected] wrote:
@Sharathnasa https://github.com/sharathnasa [image: image] https://user-images.githubusercontent.com/11580882/28951542-4bd69af6-78fe-11e7-8f54-e60ffdd869ee.png
the code in red mark is the instance number, you can go into it to see the detail. i think the new version tensorflow(1.2+) will do some check when doing resizeing so when you have a example with instance number is zero, it will block the reading thread.
i add an if condition when writing tf record to make sure the instance number is >0: I rewrite the writing process so the details may be different and after this, the training project work well on tf1.3rc [image: image] https://user-images.githubusercontent.com/11580882/28951579-a5993e68-78fe-11e7-906f-d01bbcba6bf0.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CharlesShang/FastMaskRCNN/issues/113#issuecomment-320139507, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMgs-0DlMR6yM3r0f_lODObNMyYnFDvks5sUoEzgaJpZM4OdM0D .
-- Regards, Sharath Kumar R
The only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it.” - Steve Jobs (1955 - 2011)
@Sharathnasa i did not use tensorflow1.1 but you can try it
@LovPe May i know which line your code start from ? I met the same problem too. Your assistance is highly appreciated.
@WKChung1028 in my implementation: tf_rcnn->lib->datasets->convert_coco.py ,line 226 but i rewrite the code so i'm not sure the are match!
@LovPe Do you mean you add the code right after this two line like this? mask = mask.astype(np.uint8) assert masks.shape[0] == gt_boxes.shape[0], 'Shape Error' show _rsesult =False break if get_boxes.shape[0]>0 img_raw = img.tostring() mask_raw = mask.tostring()
example = _to_tfexample_coco_raw(
img_id,
img_raw,
mask_raw,
height, width, gt_boxes.shape[0],
gt_boxes.tostring(), masks.tostring())
tfrecord_writer.write(example.SerializeToString())
something like that
addif get_boxes.shape[0]>0
before implement _to_tfexample_coco_raw() function
img = img.astype(np.uint8)
assert img.size == width * height * 3, '%s' % str(img_id)
img_raw = img.tostring()
mask_raw = mask.tostring()
if gt_boxes.shape[0] > 0:
example = _to_tfexample_coco_raw(
img_id,
img_raw,
mask_raw,
height, width, gt_boxes.shape[0],
gt_boxes.tostring(), masks.tostring())
tfrecord_writer.write(example.SerializeToString())
sys.stdout.write('\n') sys.stdout.flush()
i add the command like this and run good for few literation.
but it stopped again and showed error like below:
['background']
iter 126: image-id:0516249, time:14.586(sec), regular_loss: 0.247718, total-loss 0.6493(0.0921, 0.5463, 0.000000, 0.0109, 0.0000), instances: 11, batch:(26|114, 0|33, 0|0)
labels
[]
classes
['background']
Traceback (most recent call last):
File "train/train.py", line 339, in
@LovPe I am a beginner, hope to know more about the solution from you ,. Thank you.
@WKChung1028
1/make sure your tfrecord file was generated from new code
2/ i use python 3.5 with tf1.3
-
Which means i need to delete all tfrecord file generated by previous code in the record before i run the new code ?
-
i am using tf 1.4 and python 2.7 in cpu .
1/yes 2/i think is all right
Hi, i've looked into this bug. Actually bad string is coco.py: gt_boxes = tf.decode_raw(features['label/gt_boxes'], tf.float32) The problem is if we call tf.decode_raw from empty string('') it returns tensor [0] i.e. tf.decode_raw('', tf.float32).eval() == array([ 0.], dtype=float32) I think we should handle empty in special way string here. I'll try to make a fix
I am actually not sure about 'delete all tfrecord file generated by previous code' because actually by this you will delete all images without markup. They could still be useful for training.
fix: https://github.com/CharlesShang/FastMaskRCNN/pull/160
tfrecord file in use tf1.2 generate,but run use tf1.5, Will have this problem?
@LovPe I just follow your advice and this problem has been partially solved. But the program still ended at iter124 and still had this problem. I want to know how to deal with it totally. I use tf1.4 and python2.7.
@LiuPearl1 i use python3.6 and tf1.2 when solving this problem, and currently i have already give up to use this project and work on the original implementation on caffe2. i found there are some details different between 2 projects.