FFB6D
FFB6D copied to clipboard
When training to the second epoch, encounter "AssertionError: scalar should be 0D"
Hi Yisheng,
There are somethings wrong when I train the Linemod dataset. Specifically,When training to the second epoch,the program will stop training and encounter the following error code. I hope to get your help,thanks a lot. Best wish.
At the first epoch, everything looks normal. ####################################################### train_dataset_size: 186 cls_id in lm_dataset.py 1 cls_id in lm_dataset.py 1 test_dataset_size: 1050 test_dataset_size: 1050 loading resnet34 pretrained mdl. loading resnet34 pretrained mdl. local_rank: 0 local_rank: 1 Selected optimization level O0: Pure FP32 training.
Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
/home/extend/gy/miniconda3/envs/FFB/lib/python3.7/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
Totally train 775 iters per gpu.
ape_epochs: 0%| | 0/25 [00:00<?, ?it/s]/home/extend/gy/miniconda3/envs/FFB/lib/python3.7/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
Totally train 775 iters per gpu.
ape_epochs: 0%| | 0/25 [00:00<?, ?it/s]kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt [00:00<?, ?it/s]
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
/home/extend/gy/miniconda3/envs/FFB/lib/python3.7/site-packages/torch/nn/functional.py:2416: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/extend/gy/miniconda3/envs/FFB/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/extend/gy/miniconda3/envs/FFB/lib/python3.7/site-packages/torch/nn/functional.py:2416: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/extend/gy/miniconda3/envs/FFB/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/extend/gy/miniconda3/envs/FFB/lib/python3.7/site-packages/torch/nn/modules/container.py:100: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
/home/extend/gy/miniconda3/envs/FFB/lib/python3.7/site-packages/torch/nn/modules/container.py:100: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
/home/extend/gy/pyproject/FFB6D/ffb6d/models/loss.py:30: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
logpt = F.log_softmax(input)
/home/extend/gy/pyproject/FFB6D/ffb6d/models/loss.py:30: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
logpt = F.log_softmax(input)
ape_epochs: 4%|\u2588\u258e | 1/25 [00:52<20:48, 52.02s/it]kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt/it, total_it=31]
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt/it, total_it=31]
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
###########################################################
But complete the second training epoch, it will stop training. Like this,
###########################################################
ape_epochs: 8%|\u2588\u2588\u258c | 2/25 [01:30<16:57, 44.25s/it]kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt/it, total_it=62]
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt/it, total_it=61]
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
ape_epochs: 8%|\u2588\u2588\u258c kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txtit [00:00, ?it/s]
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txtkps_pth in get_kps:
datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
kps_pth in get_kps: datasets/linemod/kps_orb9_fps/ape_8_kps.txt
loss_rgbd_seg 0.11550371497869491
loss_kp_of 11.04855725969587
loss_ctr_of 1.3788805430276054
loss_all 12.658445249285016
loss_target 12.658445249285016
acc_rgbd 84.9538414137704
ape_epochs: 8%|\u2588\u2588\u258c | 2/25 [02:42<31:09, 81.28s/it]
Traceback (most recent call last):
File "train_lm.py", line 697, in
Hi, I have meet the same question as you. If you solve this problem, please give me some advices.
Hi, I have meet the same question as you. If you solve this problem, please give me some advices.
I have tried to lower the version of python and tensorboard, but it doesn't work. On the other way, since "write_scalar" only plays the role of saving data for display on the tensorboard. It does not affect training, so I commenting that. There is everything normal until the training stage,I hope it can help you.
Have you solved this problem? Requesting help, I have also encountered this issue
看之前的回答应该是不影响训练精度,可以不用管
At 2023-05-06 23:03:36, "lin-fangzhou" @.***> wrote:
Have you solved this problem? Requesting help, I have also encountered this issue
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
你把acc_dict输出,其中acc_rgbd不是零维,我不太确定是不是可以取平均,然后这样就是零维可以正常运行