About total loss for fine-tuning on custom dataset
Thanks great work! we training model in custom dataset, batchsize:2~12, checkpoint strict:False, lr: 1e-4 why is the oscillation amplitude relatively large? the convergence speed is very slow. is it normal?
INFO 2025-08-13 03:04:13,361 general.py: 113: Train Epoch: [21][ 410/1000000] | Batch Time: 4.7583 (8.9744) | Data Time: 0.0341 (1.5500) | Mem (GB): 72.0000 (71.1290) | Time Elapsed: 01d 18h 31m | Loss/train_loss_objective: 0.6324 (0.7573) | Loss/train_loss_camera: 0.1550 (0.1644) | Loss/train_loss_T: 0.0858 (0.0905) | Loss/train_loss_R: 0.0527 (0.0589) | Loss/train_loss_FL: 0.0332 (0.0301) | Loss/train_loss_conf_depth: -0.1798 (-0.1156) | Loss/train_loss_reg_depth: 0.0308 (0.0435) | Loss/train_loss_grad_depth: 0.0063 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 3.2478 (2.9475) | Grad/camera: 0.3169 (0.4070) INFO 2025-08-13 03:04:15,121 general.py: 113: Train Epoch: [21][ 411/1000000] | Batch Time: 1.7599 (8.9568) | Data Time: 0.0704 (1.5464) | Mem (GB): 72.0000 (71.1311) | Time Elapsed: 01d 18h 31m | Loss/train_loss_objective: 0.6191 (0.7571) | Loss/train_loss_camera: 0.1573 (0.1644) | Loss/train_loss_T: 0.0751 (0.0905) | Loss/train_loss_R: 0.0720 (0.0589) | Loss/train_loss_FL: 0.0203 (0.0301) | Loss/train_loss_conf_depth: -0.1996 (-0.1157) | Loss/train_loss_reg_depth: 0.0256 (0.0435) | Loss/train_loss_grad_depth: 0.0068 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 3.6096 (2.9491) | Grad/camera: 0.3657 (0.4069) INFO 2025-08-13 03:04:18,365 general.py: 113: Train Epoch: [21][ 412/1000000] | Batch Time: 3.2438 (8.9430) | Data Time: 0.0271 (1.5427) | Mem (GB): 72.0000 (71.1332) | Time Elapsed: 01d 18h 31m | Loss/train_loss_objective: 0.8482 (0.7574) | Loss/train_loss_camera: 0.1787 (0.1645) | Loss/train_loss_T: 0.0941 (0.0905) | Loss/train_loss_R: 0.0574 (0.0589) | Loss/train_loss_FL: 0.0546 (0.0301) | Loss/train_loss_conf_depth: -0.0999 (-0.1157) | Loss/train_loss_reg_depth: 0.0463 (0.0435) | Loss/train_loss_grad_depth: 0.0082 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 2.1256 (2.9471) | Grad/camera: 0.3033 (0.4067) INFO 2025-08-13 03:04:20,716 general.py: 113: Train Epoch: [21][ 413/1000000] | Batch Time: 2.3512 (8.9271) | Data Time: 0.0378 (1.5391) | Mem (GB): 72.0000 (71.1353) | Time Elapsed: 01d 18h 31m | Loss/train_loss_objective: 0.4587 (0.7567) | Loss/train_loss_camera: 0.1122 (0.1644) | Loss/train_loss_T: 0.0553 (0.0904) | Loss/train_loss_R: 0.0457 (0.0589) | Loss/train_loss_FL: 0.0224 (0.0301) | Loss/train_loss_conf_depth: -0.1472 (-0.1158) | Loss/train_loss_reg_depth: 0.0363 (0.0435) | Loss/train_loss_grad_depth: 0.0086 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.0880 (2.9426) | Grad/camera: 0.3091 (0.4064) INFO 2025-08-13 03:04:52,811 general.py: 113: Train Epoch: [21][ 414/1000000] | Batch Time: 32.0945 (8.9829) | Data Time: 0.0380 (1.5355) | Mem (GB): 72.0000 (71.1373) | Time Elapsed: 01d 18h 32m | Loss/train_loss_objective: 0.7819 (0.7567) | Loss/train_loss_camera: 0.1904 (0.1644) | Loss/train_loss_T: 0.1041 (0.0904) | Loss/train_loss_R: 0.0752 (0.0589) | Loss/train_loss_FL: 0.0221 (0.0301) | Loss/train_loss_conf_depth: -0.2030 (-0.1158) | Loss/train_loss_reg_depth: 0.0262 (0.0435) | Loss/train_loss_grad_depth: 0.0066 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 2.9609 (2.9426) | Grad/camera: 0.4561 (0.4066) INFO 2025-08-13 03:04:54,806 general.py: 113: Train Epoch: [21][ 415/1000000] | Batch Time: 1.9954 (8.9661) | Data Time: 0.0494 (1.5319) | Mem (GB): 72.0000 (71.1394) | Time Elapsed: 01d 18h 32m | Loss/train_loss_objective: 0.8730 (0.7569) | Loss/train_loss_camera: 0.1989 (0.1644) | Loss/train_loss_T: 0.1119 (0.0905) | Loss/train_loss_R: 0.0763 (0.0589) | Loss/train_loss_FL: 0.0215 (0.0301) | Loss/train_loss_conf_depth: -0.1608 (-0.1159) | Loss/train_loss_reg_depth: 0.0323 (0.0434) | Loss/train_loss_grad_depth: 0.0069 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 2.2664 (2.9410) | Grad/camera: 0.3328 (0.4064) INFO 2025-08-13 03:04:59,192 general.py: 113: Train Epoch: [21][ 416/1000000] | Batch Time: 4.3860 (8.9551) | Data Time: 0.0342 (1.5283) | Mem (GB): 72.0000 (71.1415) | Time Elapsed: 01d 18h 32m | Loss/train_loss_objective: 0.6500 (0.7567) | Loss/train_loss_camera: 0.1540 (0.1644) | Loss/train_loss_T: 0.0832 (0.0905) | Loss/train_loss_R: 0.0600 (0.0589) | Loss/train_loss_FL: 0.0215 (0.0301) | Loss/train_loss_conf_depth: -0.1622 (-0.1160) | Loss/train_loss_reg_depth: 0.0350 (0.0434) | Loss/train_loss_grad_depth: 0.0074 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 2.0329 (2.9388) | Grad/camera: 0.4411 (0.4065) INFO 2025-08-13 03:05:02,506 general.py: 113: Train Epoch: [21][ 417/1000000] | Batch Time: 3.3136 (8.9416) | Data Time: 0.0662 (1.5248) | Mem (GB): 72.0000 (71.1435) | Time Elapsed: 01d 18h 32m | Loss/train_loss_objective: 0.8338 (0.7571) | Loss/train_loss_camera: 0.1788 (0.1645) | Loss/train_loss_T: 0.0941 (0.0905) | Loss/train_loss_R: 0.0709 (0.0590) | Loss/train_loss_FL: 0.0276 (0.0301) | Loss/train_loss_conf_depth: -0.1099 (-0.1159) | Loss/train_loss_reg_depth: 0.0417 (0.0434) | Loss/train_loss_grad_depth: 0.0081 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.0913 (2.9344) | Grad/camera: 0.2613 (0.4061) INFO 2025-08-13 03:05:06,465 general.py: 113: Train Epoch: [21][ 418/1000000] | Batch Time: 3.9590 (8.9297) | Data Time: 0.2613 (1.5218) | Mem (GB): 72.0000 (71.1456) | Time Elapsed: 01d 18h 32m | Loss/train_loss_objective: 0.5862 (0.7564) | Loss/train_loss_camera: 0.1233 (0.1643) | Loss/train_loss_T: 0.0680 (0.0904) | Loss/train_loss_R: 0.0393 (0.0589) | Loss/train_loss_FL: 0.0319 (0.0301) | Loss/train_loss_conf_depth: -0.0853 (-0.1158) | Loss/train_loss_reg_depth: 0.0489 (0.0434) | Loss/train_loss_grad_depth: 0.0061 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.0870 (2.9300) | Grad/camera: 0.2935 (0.4059) INFO 2025-08-13 03:05:10,440 general.py: 113: Train Epoch: [21][ 419/1000000] | Batch Time: 3.9753 (8.9180) | Data Time: 0.0514 (1.5183) | Mem (GB): 72.0000 (71.1476) | Time Elapsed: 01d 18h 32m | Loss/train_loss_objective: 0.8124 (0.7565) | Loss/train_loss_camera: 0.1743 (0.1643) | Loss/train_loss_T: 0.0991 (0.0904) | Loss/train_loss_R: 0.0680 (0.0589) | Loss/train_loss_FL: 0.0143 (0.0301) | Loss/train_loss_conf_depth: -0.1140 (-0.1158) | Loss/train_loss_reg_depth: 0.0485 (0.0434) | Loss/train_loss_grad_depth: 0.0064 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 2.7989 (2.9297) | Grad/camera: 0.4787 (0.4060) INFO 2025-08-13 03:05:12,880 general.py: 113: Train Epoch: [21][ 420/1000000] | Batch Time: 2.4393 (8.9026) | Data Time: 0.0560 (1.5148) | Mem (GB): 72.0000 (71.1496) | Time Elapsed: 01d 18h 32m | Loss/train_loss_objective: 0.5165 (0.7559) | Loss/train_loss_camera: 0.1243 (0.1642) | Loss/train_loss_T: 0.0662 (0.0903) | Loss/train_loss_R: 0.0441 (0.0589) | Loss/train_loss_FL: 0.0280 (0.0301) | Loss/train_loss_conf_depth: -0.1469 (-0.1159) | Loss/train_loss_reg_depth: 0.0348 (0.0434) | Loss/train_loss_grad_depth: 0.0071 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.7203 (2.9268) | Grad/camera: 0.4995 (0.4063) INFO 2025-08-13 03:05:16,549 general.py: 113: Train Epoch: [21][ 421/1000000] | Batch Time: 3.6694 (8.8902) | Data Time: 0.0355 (1.5113) | Mem (GB): 72.0000 (71.1517) | Time Elapsed: 01d 18h 32m | Loss/train_loss_objective: 0.4920 (0.7557) | Loss/train_loss_camera: 0.1088 (0.1642) | Loss/train_loss_T: 0.0608 (0.0903) | Loss/train_loss_R: 0.0360 (0.0588) | Loss/train_loss_FL: 0.0240 (0.0301) | Loss/train_loss_conf_depth: -0.1067 (-0.1159) | Loss/train_loss_reg_depth: 0.0473 (0.0434) | Loss/train_loss_grad_depth: 0.0073 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 2.5746 (2.9260) | Grad/camera: 0.3220 (0.4061) INFO 2025-08-13 03:06:08,333 general.py: 113: Train Epoch: [21][ 422/1000000] | Batch Time: 51.7838 (8.9916) | Data Time: 0.2038 (1.5082) | Mem (GB): 72.0000 (71.1537) | Time Elapsed: 01d 18h 33m | Loss/train_loss_objective: 0.8643 (0.7564) | Loss/train_loss_camera: 0.1794 (0.1643) | Loss/train_loss_T: 0.1040 (0.0904) | Loss/train_loss_R: 0.0574 (0.0588) | Loss/train_loss_FL: 0.0359 (0.0301) | Loss/train_loss_conf_depth: -0.0894 (-0.1157) | Loss/train_loss_reg_depth: 0.0492 (0.0435) | Loss/train_loss_grad_depth: 0.0075 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.1981 (2.9219) | Grad/camera: 0.4155 (0.4061) INFO 2025-08-13 03:06:11,004 general.py: 113: Train Epoch: [21][ 423/1000000] | Batch Time: 2.6711 (8.9767) | Data Time: 0.0518 (1.5048) | Mem (GB): 72.0000 (71.1557) | Time Elapsed: 01d 18h 33m | Loss/train_loss_objective: 0.9103 (0.7567) | Loss/train_loss_camera: 0.1925 (0.1643) | Loss/train_loss_T: 0.1082 (0.0904) | Loss/train_loss_R: 0.0725 (0.0589) | Loss/train_loss_FL: 0.0234 (0.0301) | Loss/train_loss_conf_depth: -0.1051 (-0.1157) | Loss/train_loss_reg_depth: 0.0440 (0.0435) | Loss/train_loss_grad_depth: 0.0091 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 0.7054 (2.9167) | Grad/camera: 0.3903 (0.4060) INFO 2025-08-13 03:06:13,250 general.py: 113: Train Epoch: [21][ 424/1000000] | Batch Time: 2.2464 (8.9608) | Data Time: 0.1410 (1.5016) | Mem (GB): 72.0000 (71.1576) | Time Elapsed: 01d 18h 33m | Loss/train_loss_objective: 1.4167 (0.7586) | Loss/train_loss_camera: 0.2805 (0.1647) | Loss/train_loss_T: 0.1659 (0.0906) | Loss/train_loss_R: 0.1016 (0.0590) | Loss/train_loss_FL: 0.0262 (0.0301) | Loss/train_loss_conf_depth: -0.0531 (-0.1155) | Loss/train_loss_reg_depth: 0.0582 (0.0435) | Loss/train_loss_grad_depth: 0.0090 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.4153 (2.9131) | Grad/camera: 0.3905 (0.4060) INFO 2025-08-13 03:06:15,986 general.py: 113: Train Epoch: [21][ 425/1000000] | Batch Time: 2.7360 (8.9462) | Data Time: 0.0267 (1.4981) | Mem (GB): 72.0000 (71.1596) | Time Elapsed: 01d 18h 33m | Loss/train_loss_objective: 0.7387 (0.7585) | Loss/train_loss_camera: 0.1717 (0.1647) | Loss/train_loss_T: 0.0922 (0.0907) | Loss/train_loss_R: 0.0667 (0.0590) | Loss/train_loss_FL: 0.0256 (0.0301) | Loss/train_loss_conf_depth: -0.1581 (-0.1157) | Loss/train_loss_reg_depth: 0.0313 (0.0435) | Loss/train_loss_grad_depth: 0.0072 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 2.1311 (2.9113) | Grad/camera: 0.4231 (0.4060) INFO 2025-08-13 03:06:20,591 general.py: 113: Train Epoch: [21][ 426/1000000] | Batch Time: 4.6050 (8.9360) | Data Time: 0.0501 (1.4947) | Mem (GB): 72.0000 (71.1616) | Time Elapsed: 01d 18h 33m | Loss/train_loss_objective: 1.1120 (0.7593) | Loss/train_loss_camera: 0.2427 (0.1649) | Loss/train_loss_T: 0.1263 (0.0907) | Loss/train_loss_R: 0.1032 (0.0591) | Loss/train_loss_FL: 0.0263 (0.0300) | Loss/train_loss_conf_depth: -0.1448 (-0.1158) | Loss/train_loss_reg_depth: 0.0366 (0.0434) | Loss/train_loss_grad_depth: 0.0069 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 4.8452 (2.9158) | Grad/camera: 0.2838 (0.4058) INFO 2025-08-13 03:06:24,587 general.py: 113: Train Epoch: [21][ 427/1000000] | Batch Time: 3.9955 (8.9245) | Data Time: 0.0943 (1.4915) | Mem (GB): 72.0000 (71.1636) | Time Elapsed: 01d 18h 33m | Loss/train_loss_objective: 0.7366 (0.7592) | Loss/train_loss_camera: 0.1537 (0.1648) | Loss/train_loss_T: 0.0746 (0.0906) | Loss/train_loss_R: 0.0640 (0.0592) | Loss/train_loss_FL: 0.0302 (0.0300) | Loss/train_loss_conf_depth: -0.0893 (-0.1156) | Loss/train_loss_reg_depth: 0.0489 (0.0435) | Loss/train_loss_grad_depth: 0.0085 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.7525 (2.9131) | Grad/camera: 0.5438 (0.4061) INFO 2025-08-13 03:06:28,565 general.py: 113: Train Epoch: [21][ 428/1000000] | Batch Time: 3.9781 (8.9130) | Data Time: 0.0818 (1.4882) | Mem (GB): 72.0000 (71.1655) | Time Elapsed: 01d 18h 34m | Loss/train_loss_objective: 0.6610 (0.7591) | Loss/train_loss_camera: 0.1307 (0.1648) | Loss/train_loss_T: 0.0739 (0.0906) | Loss/train_loss_R: 0.0377 (0.0591) | Loss/train_loss_FL: 0.0382 (0.0301) | Loss/train_loss_conf_depth: -0.0527 (-0.1155) | Loss/train_loss_reg_depth: 0.0532 (0.0435) | Loss/train_loss_grad_depth: 0.0069 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 2.8888 (2.9130) | Grad/camera: 0.4466 (0.4062) INFO 2025-08-13 03:06:33,897 general.py: 113: Train Epoch: [21][ 429/1000000] | Batch Time: 5.3323 (8.9046) | Data Time: 0.0446 (1.4848) | Mem (GB): 72.0000 (71.1674) | Time Elapsed: 01d 18h 34m | Loss/train_loss_objective: 0.3369 (0.7587) | Loss/train_loss_camera: 0.0992 (0.1647) | Loss/train_loss_T: 0.0541 (0.0906) | Loss/train_loss_R: 0.0365 (0.0591) | Loss/train_loss_FL: 0.0172 (0.0300) | Loss/train_loss_conf_depth: -0.1933 (-0.1156) | Loss/train_loss_reg_depth: 0.0283 (0.0435) | Loss/train_loss_grad_depth: 0.0059 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 0.9110 (2.9084) | Grad/camera: 0.3577 (0.4061) INFO 2025-08-13 03:07:12,067 general.py: 113: Train Epoch: [21][ 430/1000000] | Batch Time: 38.1700 (8.9725) | Data Time: 0.0458 (1.4815) | Mem (GB): 72.0000 (71.1694) | Time Elapsed: 01d 18h 34m | Loss/train_loss_objective: 0.7484 (0.7587) | Loss/train_loss_camera: 0.1726 (0.1647) | Loss/train_loss_T: 0.1085 (0.0906) | Loss/train_loss_R: 0.0559 (0.0591) | Loss/train_loss_FL: 0.0163 (0.0300) | Loss/train_loss_conf_depth: -0.1556 (-0.1156) | Loss/train_loss_reg_depth: 0.0361 (0.0435) | Loss/train_loss_grad_depth: 0.0049 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.0690 (2.9041) | Grad/camera: 0.3096 (0.4058) INFO 2025-08-13 03:07:14,558 general.py: 113: Train Epoch: [21][ 431/1000000] | Batch Time: 2.4908 (8.9575) | Data Time: 0.0274 (1.4781) | Mem (GB): 72.0000 (71.1713) | Time Elapsed: 01d 18h 34m | Loss/train_loss_objective: 1.2234 (0.7591) | Loss/train_loss_camera: 0.2050 (0.1648) | Loss/train_loss_T: 0.1074 (0.0906) | Loss/train_loss_R: 0.0873 (0.0591) | Loss/train_loss_FL: 0.0206 (0.0300) | Loss/train_loss_conf_depth: 0.1066 (-0.1154) | Loss/train_loss_reg_depth: 0.0848 (0.0435) | Loss/train_loss_grad_depth: 0.0071 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.3108 (2.9004) | Grad/camera: 0.3692 (0.4057) INFO 2025-08-13 03:07:17,315 general.py: 113: Train Epoch: [21][ 432/1000000] | Batch Time: 2.7571 (8.9432) | Data Time: 0.0403 (1.4748) | Mem (GB): 72.0000 (71.1732) | Time Elapsed: 01d 18h 34m | Loss/train_loss_objective: 0.8995 (0.7594) | Loss/train_loss_camera: 0.1782 (0.1648) | Loss/train_loss_T: 0.0957 (0.0906) | Loss/train_loss_R: 0.0697 (0.0592) | Loss/train_loss_FL: 0.0255 (0.0300) | Loss/train_loss_conf_depth: -0.0559 (-0.1153) | Loss/train_loss_reg_depth: 0.0557 (0.0435) | Loss/train_loss_grad_depth: 0.0088 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 4.2505 (2.9036) | Grad/camera: 0.4953 (0.4060) INFO 2025-08-13 03:07:35,600 general.py: 113: Train Epoch: [21][ 433/1000000] | Batch Time: 18.2845 (8.9647) | Data Time: 0.0461 (1.4715) | Mem (GB): 72.0000 (71.1751) | Time Elapsed: 01d 18h 35m | Loss/train_loss_objective: 0.8572 (0.7596) | Loss/train_loss_camera: 0.1697 (0.1648) | Loss/train_loss_T: 0.0981 (0.0906) | Loss/train_loss_R: 0.0610 (0.0592) | Loss/train_loss_FL: 0.0211 (0.0300) | Loss/train_loss_conf_depth: -0.0555 (-0.1152) | Loss/train_loss_reg_depth: 0.0580 (0.0436) | Loss/train_loss_grad_depth: 0.0062 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 4.6466 (2.9076) | Grad/camera: 0.4313 (0.4060) INFO 2025-08-13 03:07:38,757 general.py: 113: Train Epoch: [21][ 434/1000000] | Batch Time: 3.1574 (8.9514) | Data Time: 0.0351 (1.4682) | Mem (GB): 72.0000 (71.1770) | Time Elapsed: 01d 18h 35m | Loss/train_loss_objective: 0.2230 (0.7591) | Loss/train_loss_camera: 0.0700 (0.1647) | Loss/train_loss_T: 0.0354 (0.0906) | Loss/train_loss_R: 0.0248 (0.0591) | Loss/train_loss_FL: 0.0197 (0.0300) | Loss/train_loss_conf_depth: -0.1648 (-0.1152) | Loss/train_loss_reg_depth: 0.0308 (0.0435) | Loss/train_loss_grad_depth: 0.0067 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 4.5970 (2.9115) | Grad/camera: 0.4186 (0.4060) INFO 2025-08-13 03:07:42,949 general.py: 113: Train Epoch: [21][ 435/1000000] | Batch Time: 4.1923 (8.9405) | Data Time: 0.0453 (1.4649) | Mem (GB): 72.0000 (71.1789) | Time Elapsed: 01d 18h 35m | Loss/train_loss_objective: 0.4797 (0.7587) | Loss/train_loss_camera: 0.1304 (0.1647) | Loss/train_loss_T: 0.0701 (0.0906) | Loss/train_loss_R: 0.0452 (0.0591) | Loss/train_loss_FL: 0.0302 (0.0300) | Loss/train_loss_conf_depth: -0.2027 (-0.1154) | Loss/train_loss_reg_depth: 0.0256 (0.0435) | Loss/train_loss_grad_depth: 0.0050 (0.0072) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.4507 (2.9081) | Grad/camera: 0.6571 (0.4066) INFO 2025-08-13 03:07:47,452 general.py: 113: Train Epoch: [21][ 436/1000000] | Batch Time: 4.5024 (8.9303) | Data Time: 0.0650 (1.4617) | Mem (GB): 72.0000 (71.1808) | Time Elapsed: 01d 18h 35m | Loss/train_loss_objective: 0.6384 (0.7579) | Loss/train_loss_camera: 0.1255 (0.1644) | Loss/train_loss_T: 0.0637 (0.0904) | Loss/train_loss_R: 0.0361 (0.0590) | Loss/train_loss_FL: 0.0514 (0.0301) | Loss/train_loss_conf_depth: -0.0572 (-0.1150) | Loss/train_loss_reg_depth: 0.0602 (0.0436) | Loss/train_loss_grad_depth: 0.0082 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.1176 (2.9040) | Grad/camera: 0.4219 (0.4067) INFO 2025-08-13 03:07:49,916 general.py: 113: Train Epoch: [21][ 437/1000000] | Batch Time: 2.4645 (8.9156) | Data Time: 0.0966 (1.4586) | Mem (GB): 72.0000 (71.1826) | Time Elapsed: 01d 18h 35m | Loss/train_loss_objective: 0.9242 (0.7581) | Loss/train_loss_camera: 0.1987 (0.1644) | Loss/train_loss_T: 0.1250 (0.0904) | Loss/train_loss_R: 0.0659 (0.0590) | Loss/train_loss_FL: 0.0157 (0.0301) | Loss/train_loss_conf_depth: -0.1284 (-0.1150) | Loss/train_loss_reg_depth: 0.0519 (0.0436) | Loss/train_loss_grad_depth: 0.0074 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.8562 (2.9016) | Grad/camera: 0.3319 (0.4065) INFO 2025-08-13 03:08:24,096 general.py: 113: Train Epoch: [21][ 438/1000000] | Batch Time: 34.1795 (8.9731) | Data Time: 0.0342 (1.4554) | Mem (GB): 72.0000 (71.1845) | Time Elapsed: 01d 18h 35m | Loss/train_loss_objective: 0.9957 (0.7583) | Loss/train_loss_camera: 0.1853 (0.1645) | Loss/train_loss_T: 0.1039 (0.0904) | Loss/train_loss_R: 0.0715 (0.0590) | Loss/train_loss_FL: 0.0196 (0.0301) | Loss/train_loss_conf_depth: -0.0039 (-0.1149) | Loss/train_loss_reg_depth: 0.0669 (0.0437) | Loss/train_loss_grad_depth: 0.0064 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 4.1658 (2.9045) | Grad/camera: 0.3421 (0.4063) INFO 2025-08-13 03:08:27,810 general.py: 113: Train Epoch: [21][ 439/1000000] | Batch Time: 3.7142 (8.9612) | Data Time: 0.2357 (1.4526) | Mem (GB): 72.0000 (71.1864) | Time Elapsed: 01d 18h 36m | Loss/train_loss_objective: 0.5908 (0.7580) | Loss/train_loss_camera: 0.1246 (0.1644) | Loss/train_loss_T: 0.0718 (0.0904) | Loss/train_loss_R: 0.0444 (0.0589) | Loss/train_loss_FL: 0.0166 (0.0301) | Loss/train_loss_conf_depth: -0.0870 (-0.1148) | Loss/train_loss_reg_depth: 0.0489 (0.0437) | Loss/train_loss_grad_depth: 0.0060 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 1.8079 (2.9020) | Grad/camera: 0.5135 (0.4066) INFO 2025-08-13 03:08:30,006 general.py: 113: Train Epoch: [21][ 440/1000000] | Batch Time: 2.1953 (8.9458) | Data Time: 0.0678 (1.4495) | Mem (GB): 72.0000 (71.1882) | Time Elapsed: 01d 18h 36m | Loss/train_loss_objective: 0.3028 (0.7574) | Loss/train_loss_camera: 0.0910 (0.1643) | Loss/train_loss_T: 0.0568 (0.0904) | Loss/train_loss_R: 0.0262 (0.0589) | Loss/train_loss_FL: 0.0161 (0.0301) | Loss/train_loss_conf_depth: -0.1877 (-0.1149) | Loss/train_loss_reg_depth: 0.0284 (0.0436) | Loss/train_loss_grad_depth: 0.0071 (0.0073) | Grad/aggregator: 0.0000 (0.0000) | Grad/depth: 4.1091 (2.9047) | Grad/camera: 0.4954 (0.4068)
Yeah this is converging relatively slow. Ideally with so many steps the avg loss_R should be around 0.01.
Yeah this is converging relatively slow. Ideally with so many steps the avg loss_R should be around 0.01.
But I trained for 16k iteratioins, loss_R from 0.07 to 0.05, loss_T from 0.12 to 0.08. Is the number of iterations far from enough?
@Jou719 I met this situation that you described. Were you able to solve that?