AssertionError in training
I met this assertionError when I was training this model. Can you guys help me?
Traceback (most recent call last):
File "/anaconda3/envs/fasterRCNN/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/CrowdDet/tools/train.py", line 109, in train_worker
do_train_epoch(net, data_iter, optimizer, rank, epoch_id, train_config)
File "/CrowdDet/tools/train.py", line 58, in do_train_epoch
assert torch.isfinite(total_loss).all(), outputs
AssertionError: {'loss_rpn_cls': tensor(nan, device='cuda:0', grad_fn=<MulBackward0>), 'loss_rpn_loc': tensor(inf, device='cuda:0', grad_fn=<MulBackward0>), 'loss_rcnn_loc': tensor(nan, device='cuda:0', grad_fn=<MulBackward0>), 'loss_rcnn_cls': tensor(nan, device='cuda:0', grad_fn=<MulBackward0>)}```
Try several times, sometimes this error raises at the beginning of training.
Try several times, sometimes this error raises at the beginning of training.
Sorry, but we've tried several times and just get the same error in the almost same iteration in the first epoch
Have you modified the code or data? Such mistakes rarely occur. Try changing the dataset initialization sequence or decreasing the learning rate.
I have a another AssertionError in training. Can you help me?
Num of GPUs:3, learning rate:0.00750, mini batch size:2,
train_epoch:30, iter_per_epoch:2500, decay_epoch:[24, 27]
Init multi-processing training...
Traceback (most recent call last):
File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 174, in
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 102, in train_worker crowdhuman = CrowdHuman(config, if_train=True) File "../lib/data/CrowdHuman.py", line 20, in init self.records = misc_utils.load_json_lines(source) File "../lib/utils/misc_utils.py", line 11, in load_json_lines assert os.path.exists(fpath) AssertionError
I have a another AssertionError in training. Can you help me?
Num of GPUs:3, learning rate:0.00750, mini batch size:2, train_epoch:30, iter_per_epoch:2500, decay_epoch:[24, 27] Init multi-processing training... Traceback (most recent call last): File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 174, in run_train() File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 171, in run_train multi_train(args, config, Network) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 155, in multi_train torch.multiprocessing.spawn(train_worker, nprocs=num_gpus, args=(train_config, network, config)) File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn while not spawn_context.join(): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 102, in train_worker crowdhuman = CrowdHuman(config, if_train=True) File "../lib/data/CrowdHuman.py", line 20, in init self.records = misc_utils.load_json_lines(source) File "../lib/utils/misc_utils.py", line 11, in load_json_lines assert os.path.exists(fpath) AssertionError
Looks like the annotation file path is wrong, Check the "train_source" and "eval_source" in config.py.
I have a another AssertionError in training. Can you help me?
Num of GPUs:3, learning rate:0.00750, mini batch size:2, train_epoch:30, iter_per_epoch:2500, decay_epoch:[24, 27] Init multi-processing training... Traceback (most recent call last): File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 174, in run_train() File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 171, in run_train multi_train(args, config, Network) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 155, in multi_train torch.multiprocessing.spawn(train_worker, nprocs=num_gpus, args=(train_config, network, config)) File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn while not spawn_context.join(): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 102, in train_worker crowdhuman = CrowdHuman(config, if_train=True) File "../lib/data/CrowdHuman.py", line 20, in init self.records = misc_utils.load_json_lines(source) File "../lib/utils/misc_utils.py", line 11, in load_json_lines assert os.path.exists(fpath) AssertionError
你好,请问你的问题解决了吗,我好像也遇到了类似的问题,关于load_json_lines的问题,找不到json_file的路径
I have a another AssertionError in training. Can you help me? Num of GPUs:3, learning rate:0.00750, mini batch size:2, train_epoch:30, iter_per_epoch:2500, decay_epoch:[24, 27] Init multi-processing training... Traceback (most recent call last): File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 174, in run_train() File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 171, in run_train multi_train(args, config, Network) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 155, in multi_train torch.multiprocessing.spawn(train_worker, nprocs=num_gpus, args=(train_config, network, config)) File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn while not spawn_context.join(): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 102, in train_worker crowdhuman = CrowdHuman(config, if_train=True) File "../lib/data/CrowdHuman.py", line 20, in init self.records = misc_utils.load_json_lines(source) File "../lib/utils/misc_utils.py", line 11, in load_json_lines assert os.path.exists(fpath) AssertionError
你好,请问你的问题解决了吗,我好像也遇到了类似的问题,关于load_json_lines的问题,找不到json_file的路径
检查一下文件到底在不在那个路径就可以了。
I have a another AssertionError in training. Can you help me? Num of GPUs:3, learning rate:0.00750, mini batch size:2, train_epoch:30, iter_per_epoch:2500, decay_epoch:[24, 27] Init multi-processing training... Traceback (most recent call last): File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 174, in run_train() File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 171, in run_train multi_train(args, config, Network) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 155, in multi_train torch.multiprocessing.spawn(train_worker, nprocs=num_gpus, args=(train_config, network, config)) File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn while not spawn_context.join(): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/xcj/anaconda3/envs/py_cpn/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/media/xcj/data/xcj/CrowdDet/tools/train.py", line 102, in train_worker crowdhuman = CrowdHuman(config, if_train=True) File "../lib/data/CrowdHuman.py", line 20, in init self.records = misc_utils.load_json_lines(source) File "../lib/utils/misc_utils.py", line 11, in load_json_lines assert os.path.exists(fpath) AssertionError
你好,请问你的问题解决了吗,我好像也遇到了类似的问题,关于load_json_lines的问题,找不到json_file的路径
检查一下文件到底在不在那个路径就可以了。
嗯嗯,我是在运行这行代码时:python3 eval_json.py -f your_json_path.json
遇到了以下错误:
Traceback (most recent call last):
File "eval_json.py", line 36, in