Deep_Object_Pose icon indicating copy to clipboard operation
Deep_Object_Pose copied to clipboard

WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.

Open liuqiang1227 opened this issue 1 year ago • 4 comments

During the annual training process, a warning will be issued

$ python -m torch.distributed.launch --nproc_per_node=1 train.py --data /home/a/CustomMap --object yepian

Loading Model... ready to train! WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. Train Epoch: 1 [0/5000 (0%)] Loss: 0.035234406590462 Local Rank: 0 Train Epoch: 1 [1600/5000 (32%)] Loss: 0.004101661965251 Local Rank: 0 Train Epoch: 1 [3200/5000 (64%)] Loss: 0.002820491790771 Local Rank: 0 Train Epoch: 1 [4800/5000 (96%)] Loss: 0.002345604822040 Local Rank: 0 WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. Train Epoch: 2 [0/5000 (0%)] Loss: 0.003404101822525 Local Rank: 0 Train Epoch: 2 [1600/5000 (32%)] Loss: 0.003067462937906 Local Rank: 0 Train Epoch: 2 [3200/5000 (64%)] Loss: 0.003346179146320 Local Rank: 0 Train Epoch: 2 [4800/5000 (96%)] Loss: 0.003464318113402 Local Rank: 0

liuqiang1227 avatar Jun 18 '24 02:06 liuqiang1227

We had this warning forever and I never was able to figure out why. You can ignore it.

TontonTremblay avatar Jun 22 '24 15:06 TontonTremblay

We had this warning forever and I never was able to figure out why. You can ignore it.

With this warning, is there any problem in training or inference? For example, the model is trained, but the inference actually doesn't work. The model can't detect any object which I want.

AusertDream avatar Feb 24 '25 04:02 AusertDream

Can you share belief maps and images of training and testing. The warning is for sure not causing the issue.

On Sun, Feb 23, 2025 at 20:14 Tianyi Jiang @.***> wrote:

We had this warning forever and I never was able to figure out why. You can ignore it.

With this warning, is there any problem in training or inference? For example, the model is trained, but the inference actually doesn't work. The model can't detect any object which I want.

— Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/368#issuecomment-2677399239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JIF6ILOFFFZA4TPNQO32RKMARAVCNFSM6AAAAABXXCV54GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZXGM4TSMRTHE . You are receiving this because you commented.Message ID: @.***> [image: AusertDream]AusertDream left a comment (NVlabs/Deep_Object_Pose#368) https://github.com/NVlabs/Deep_Object_Pose/issues/368#issuecomment-2677399239

We had this warning forever and I never was able to figure out why. You can ignore it.

With this warning, is there any problem in training or inference? For example, the model is trained, but the inference actually doesn't work. The model can't detect any object which I want.

— Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/368#issuecomment-2677399239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JIF6ILOFFFZA4TPNQO32RKMARAVCNFSM6AAAAABXXCV54GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZXGM4TSMRTHE . You are receiving this because you commented.Message ID: @.***>

TontonTremblay avatar Feb 24 '25 20:02 TontonTremblay

Yesterday, after the comment, I tried again with the my own box data and the cracker.pth to do continue-training. It worked, and no such NaN warning, though it did not work very well. But I know the dataset is too small, so it work bad. This issue really has nothing to do with the warning. I figure out my problem. Thank you for reply.

AusertDream avatar Feb 25 '25 01:02 AusertDream