simple-faster-rcnn-pytorch RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

Open aiyodiulehuner opened this issue 3 years ago • 19 comments

Traceback (most recent call last): File "train.py", line 142, in fire.Fire() File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 468, in _Fire target=component.name) File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "train.py", line 109, in train _bboxes, _labels, scores = trainer.faster_rcnn.predict([ori_img], visualize=True) File "/workspace/model/faster_rcnn.py", line 19, in new_f return f(*args,**kwargs) File "/workspace/model/faster_rcnn.py", line 233, in predict roi_cls_loc, roi_scores, rois, _ = self(img, scale=scale) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/model/faster_rcnn.py", line 133, in forward h, rois, roi_indices) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/workspace/model/faster_rcnn_vgg16.py", line 149, in forward pool = pool.view(pool.size(0), -1) # flat 操作 pool size == [300, channel(500) w(7) * h(7)] RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

If you suspect this is an IPython 7.16.1 bug, please report it at: https://github.com/ipython/ipython/issues or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug" to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via: %config Application.verbose_crash=True 怎么解决这个问题,大佬们

Nov 25 '20 12:11 aiyodiulehuner

在fire.Fire（）括号里加train试试

Dec 08 '20 08:12 yuechenshun

您好，我也遇到这个问题了。fire.Fire()加train报同样的错。

Dec 08 '20 12:12 wulele2

在fire.Fire（）括号里加train试试

加train 之后 Adam 可以，SGD 也报同样的错

Dec 09 '20 11:12 aiyodiulehuner

在fire.Fire（）括号里加train试试还有，这个问题为啥可以加train试一下

Dec 09 '20 11:12 aiyodiulehuner

您好，加上adam效果是不是没有sgd好啊。

Dec 09 '20 11:12 wulele2

您好，加上adam效果是不是没有sgd好啊。

好像是，还没调好我

Dec 09 '20 11:12 aiyodiulehuner

您好，加上adam效果是不是没有sgd好啊。

好像是，还没调好我

麻烦弄好能公布一个mAP吗？谢谢了。我debug找出的原因是在loc2box的函数中，在计算dw，dh时候exp溢出，进而导致RPN生成的128候选框的坐标变成nan。然后网络将这些候选框给裁掉之后导致不够128个。最终batch变成了0.报的错。然后，loss变成nan，梯度爆炸。

Dec 09 '20 11:12 wulele2

您好，加上adam效果是不是没有sgd好啊。

好像是，还没调好我

麻烦弄好能公布一个mAP吗？谢谢了。我debug找出的原因是在loc2box的函数中，在计算dw，dh时候exp溢出，进而导致RPN生成的128候选框的坐标变成nan。然后网络将这些候选框给裁掉之后导致不够128个。最终batch变成了0.报的错。然后，loss变成nan，梯度爆炸。

好的呢

Dec 09 '20 12:12 aiyodiulehuner

你好请问这个问题解决了吗

Apr 26 '21 17:04 hippoula

我也遇到了这个问题，后来发现也是在loc2box的函数中，在计算dw，dh的时候出现了除0情况，发现是数据的问题，里面有些bbox是线和点，过滤一遍后就没了

Sep 01 '21 02:09 xlhuang132

me too ,i dont know why this happen?

Sep 10 '21 08:09 SpeitzerPatrick

我也遇到了这个问题，后来发现也是在loc2box的函数中，在计算dw，dh的时候出现了除0情况，发现是数据的问题，里面有些bbox是线和点，过滤一遍后就没了

请问可以说一下是怎么处理的吗？我也遇到同样的问题了

Sep 26 '21 15:09 Lyndon-wong

我也遇到了这个问题，后来发现也是在loc2box的函数中，在计算dw，dh的时候出现了除0情况，发现是数据的问题，里面有些bbox是线和点，过滤一遍后就没了

请问可以说一下是怎么处理的吗？我也遇到同样的问题了

就是用bbox的坐标判断一下是否是一个矩形，不是就去掉

Sep 26 '21 15:09 xlhuang132

我也遇到了这个问题，后来发现也是在loc2box的函数中，在计算dw，dh的时候出现了除0情况，发现是数据的问题，里面有些bbox是线和点，过滤一遍后就没了

请问可以说一下是怎么处理的吗？我也遇到同样的问题了

就是用bbox的坐标判断一下是否是一个矩形，不是就去掉

好的，谢谢，我今天试试

Sep 27 '21 07:09 Lyndon-wong

我也遇到了这个问题，后来发现也是在loc2box的函数中，在计算dw，dh的时候出现了除0情况，发现是数据的问题，里面有些bbox是线和点，过滤一遍后就没了

请问可以说一下是怎么处理的吗？我也遇到同样的问题了

就是用bbox的坐标判断一下是否是一个矩形，不是就去掉

您好，我写了段程序清洗数据集，但是没找到是线段或者点的bbox？

from data.voc_dataset import VOCBboxDataset
from utils.config import opt
from tqdm import tqdm

def is_poor_bbox(bbox):
    R = bbox.shape[0]
    for i in range(R):
        y_min = bbox[i][0]
        x_min = bbox[i][1]
        y_max = bbox[i][2]
        x_max = bbox[i][3]
        if x_min == x_max:
            return True
        if y_min == y_max:
            return True
    else:
        return False


def main():
    db = VOCBboxDataset('.' + opt.voc_data_dir)
    poor_box_id = []
    for i in tqdm(range(len(db.ids))):
        img, bbox, label, difficult = db.get_example(i)
        if is_poor_bbox(bbox):
            poor_box_id.append(i)
            print("the %d th image has poor bbox", i)

if __name__ is '__main__':
    main()

您能帮忙看一下吗，好奇怪= =

Sep 28 '21 08:09 Lyndon-wong

我也遇到了这个问题，后来发现也是在loc2box的函数中，在计算dw，dh的时候出现了除0情况，发现是数据的问题，里面有些bbox是线和点，过滤一遍后就没了

前两天理解错了以为是数据集有问题，今天把loc2bbox出来的roi清洗了一遍还是有问题，我是把loc2bbox出来的有inf值的roi都删掉了，但是发现这样还是删除太多最后还是会报同样的错= =。。。统计了一下，每次几乎删除16000~18000个，不知道怎么办了

Sep 29 '21 08:09 Lyndon-wong

我换用torchvision的vgg16权重测试没有问题了，可能是caffe权重的问题。。。反正将就着用把先

Sep 29 '21 14:09 Lyndon-wong

我换用torchvision的vgg16权重测试没有问题了，可能是caffe权重的问题。。。反正将就着用把先

ok，我用的数据集是coco，没用过voc，可能还是问题的原因不一样

Sep 29 '21 14:09 xlhuang132

应该就是在loc2box的函数中，在计算dw，dh的时候有问题。然后就是如果出现这个错误应该是训练的过程中刚好bug了，一般换一种初始化权重，或者调一些学习率或者优化方法就可以避免了。

Mar 21 '23 16:03 deepxzy

simple-faster-rcnn-pytorch simple-faster-rcnn-pytorch copied to clipboard

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

simple-faster-rcnn-pytorch
simple-faster-rcnn-pytorch copied to clipboard