PMTD icon indicating copy to clipboard operation
PMTD copied to clipboard

RuntimeError: copy_if failed to synchronize: device-side assert triggered

Open donglin8506 opened this issue 5 years ago • 1 comments

@JingChaoLiu @liuxuebo0 Hello, When I always occurs the problem as follow, I don't know the reason? Someone says that learning rate is large, but what learning rate is ok? Could you give me a solution?

Traceback (most recent call last):
  File "tools/train_net.py", line 186, in <module>
    main()
  File "tools/train_net.py", line 179, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 85, in train
    arguments,
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/engine/trainer.py", line 75, in do_train
    loss_dict = model(images, targets)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 204, in new_fwd
    **applier(kwargs, input_caster))
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 207, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 223, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 115, in forward_for_single_feature_map
    boxlist = remove_small_boxes(boxlist, self.min_size)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
    (ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: device-side assert triggered
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
Traceback (most recent call last):
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 238, in <module>
    main()
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 234, in main
    cmd=process.args)

donglin8506 avatar Oct 11 '19 10:10 donglin8506

检查下训练数据, 包括类别,和坐标位置.

congjianting avatar Jan 22 '20 05:01 congjianting