sc_depth_pl icon indicating copy to clipboard operation
sc_depth_pl copied to clipboard

training loss

Open Be997398715 opened this issue 3 years ago • 7 comments

Hi, I trained the scv2 model by tum dataset, and I found that the loss could not converge to a low value after trained for 70 epochs,etc image Is there any skills in traning? And what should I pay attention to in training?

Be997398715 avatar Jan 19 '22 04:01 Be997398715

  1. the training loss looks too small "7.85e-06" at Epoch 0. It means that the training is not correct at the beginning.
  2. I didn't try tum datasets. You should make sure that the pre-processing is correct. For example, the original video is 30fps, so you have downsample it to make sure that the adjacent frames have sufficient camera movement.

JiawangBian avatar Jan 19 '22 09:01 JiawangBian

Thanks for your reply, I try to adjust learning_rate to 1e-4, and the training loss looks normal, and for pre-processing I sampled 6fps from tum dataset. The problem is that I find the val/a1 is enough high at Validation sanity check, but after train for 50 epochs the val loss is higher than Validation sanity check and the val a1 is lower than Validation sanity check like this: Validation sanity check: 80%|███████████████████████████████████████████▏ | 4/5 [00:01<00:00, 1.52it/s]val_loss 0.3161110281944275 val/abs_diff 0.49667494297027587 val/abs_rel 0.3161110281944275 val/a1 0.720618724822998 val/a2 0.8686295628547669 val/a3 0.8972877621650696 Epoch 0: 100%|██████████████████████████████████████████| 3274/3274 [24:42<00:00, 2.21it/s, loss=0.124, v_num=33]val_loss 0.3295358036745322 val/abs_diff 0.48291838420948513 val/abs_rel 0.3295358036745322 val/a1 0.5141117756425495 val/a2 0.7822030260109566 val/a3 0.8879046865472211 Epoch 1: 100%|██████████████████████████▉| 3273/3274 [24:22<00:00, 2.24it/s, loss=0.11, v_num=33, val_loss=0.330]val_loss 0.3527160783433858 val/abs_diff 0.49925796043984766 val/abs_rel 0.3527160783433858 val/a1 0.49046033935647615 val/a2 0.7741567319687543 val/a3 0.889263995096717 Epoch 2: 100%|█████████████████████████▉| 3273/3274 [24:28<00:00, 2.23it/s, loss=0.112, v_num=33, val_loss=0.353]val_loss 0.38753977480908514 val/abs_diff 0.5399117107052758 val/abs_rel 0.38753977480908514 val/a1 0.447061596516992 val/a2 0.7416755366773112 val/a3 0.8804207127978544 Epoch 3: 100%|██████████████████████████| 3274/3274 [24:40<00:00, 2.21it/s, loss=0.135, v_num=33, val_loss=0.388]val_loss 0.36882862233569924 val/abs_diff 0.5172424256067041 val/abs_rel 0.36882862233569924 val/a1 0.47232650241381685 val/a2 0.7577860181180525 val/a3 0.8850341230770792 Epoch 4: 100%|█████████████████████████▉| 3273/3274 [24:41<00:00, 2.21it/s, loss=0.128, v_num=33, val_loss=0.369]val_loss 0.3958435264446646 val/abs_diff 0.5531420344618004 val/abs_rel 0.3958435264446646 val/a1 0.4355808907656323 val/a2 0.7290604630126639 val/a3 0.8741154323721156 Epoch 5: 100%|████████████████████████▉| 3273/3274 [26:37<00:00, 2.05it/s, loss=0.0971, v_num=33, val_loss=0.396]val_loss 0.3789747621112027 val/abs_diff 0.5287676574618604 val/abs_rel 0.3789747621112027 val/a1 0.4594967093126315 val/a2 0.7496168315410614 val/a3 0.8829869093469611 Epoch 6: 100%|██████████████████████████| 3274/3274 [27:49<00:00, 1.96it/s, loss=0.103, v_num=33, val_loss=0.379]val_loss 0.40094069646837566 val/abs_diff 0.5627620091110888 val/abs_rel 0.40094069646837566 val/a1 0.4145252222081585 val/a2 0.7240058928728104 val/a3 0.8716801943633478 Epoch 7: 100%|████████████████████████▉| 3273/3274 [24:28<00:00, 2.23it/s, loss=0.0955, v_num=33, val_loss=0.401]val_loss 0.41014841760338194 val/abs_diff 0.568704753274649 val/abs_rel 0.41014841760338194 val/a1 0.4229128777001385 val/a2 0.7166387138652129 val/a3 0.8712038172522621 Epoch 8: 100%|██████████████████████████| 3274/3274 [24:51<00:00, 2.19it/s, loss=0.125, v_num=33, val_loss=0.410]val_loss 0.35890787185777523 val/abs_diff 0.5118728035737371 val/abs_rel 0.35890787185777523 val/a1 0.47335953807606945 val/a2 0.763609798651346 val/a3 0.8837065945768581 Epoch 9: 100%|████████████████████████▉| 3273/3274 [25:27<00:00, 2.14it/s, loss=0.0966, v_num=33, val_loss=0.359]val_loss 0.4000579456978001 val/abs_diff 0.5526390229242508 val/abs_rel 0.4000579456978001 val/a1 0.4362660559864951 val/a2 0.7314180177002446 val/a3 0.8779567775312164 Epoch 10: 100%|███████████████████████▉| 3273/3274 [26:47<00:00, 2.04it/s, loss=0.0985, v_num=33, val_loss=0.400]val_loss 0.35873477244880836 val/abs_diff 0.5151269339164938 val/abs_rel 0.35873477244880836 val/a1 0.4689700790274311 val/a2 0.7614134610258917 val/a3 0.8808301417200778 Epoch 11: 100%|████████████████████████▉| 3273/3274 [24:23<00:00, 2.24it/s, loss=0.111, v_num=33, val_loss=0.359]val_loss 0.35651230175450377 val/abs_diff 0.5075158562703591 val/abs_rel 0.35651230175450377 val/a1 0.47882510676210477 val/a2 0.7671535994525247 val/a3 0.8851720486448404 Epoch 12: 100%|████████████████████████▉| 3273/3274 [24:30<00:00, 2.23it/s, loss=0.104, v_num=33, val_loss=0.357]val_loss 0.3774093163720039 val/abs_diff 0.5295091758661427 val/abs_rel 0.3774093163720039 val/a1 0.4564755105986282 val/a2 0.7478528012272337 val/a3 0.8816205172191763 Epoch 13: 100%|███████████████████████▉| 3273/3274 [24:47<00:00, 2.20it/s, loss=0.0884, v_num=33, val_loss=0.377]val_loss 0.4109419768789844 val/abs_diff 0.5717653512185168 val/abs_rel 0.4109419768789844 val/a1 0.4142954336575499 val/a2 0.7145573688504842 val/a3 0.8694515650820844 Epoch 14: 100%|███████████████████████▉| 3273/3274 [24:28<00:00, 2.23it/s, loss=0.0915, v_num=33, val_loss=0.411]val_loss 0.3748826674172576 val/abs_diff 0.5288800416173229 val/abs_rel 0.3748826674172576 val/a1 0.4548990779289617 val/a2 0.74878803874965 val/a3 0.8805965936519731 Epoch 15: 100%|████████████████████████▉| 3273/3274 [24:26<00:00, 2.23it/s, loss=0.108, v_num=33, val_loss=0.375]val_loss 0.36191915622759313 val/abs_diff 0.5158892360651437 val/abs_rel 0.36191915622759313 val/a1 0.46861379897930255 val/a2 0.7593182609394682 val/a3 0.8824431157447923 Epoch 16: 100%|████████████████████████▉| 3273/3274 [24:27<00:00, 2.23it/s, loss=0.105, v_num=33, val_loss=0.362]val_loss 0.3901017568830593 val/abs_diff 0.5516582035892447 val/abs_rel 0.3901017568830593 val/a1 0.42623640075675756 val/a2 0.729820976444813 val/a3 0.8724550418730633 Epoch 17: 100%|███████████████████████▉| 3273/3274 [24:21<00:00, 2.24it/s, loss=0.0965, v_num=33, val_loss=0.390]val_loss 0.3930978460036253 val/abs_diff 0.5510394933497962 val/abs_rel 0.3930978460036253 val/a1 0.4303495900207002 val/a2 0.7301531026900654 val/a3 0.8755578554012406 Epoch 18: 100%|████████████████████████▉| 3273/3274 [24:39<00:00, 2.21it/s, loss=0.114, v_num=33, val_loss=0.393]val_loss 0.3728100278812675 val/abs_diff 0.5251243403400054 val/abs_rel 0.3728100278812675 val/a1 0.45696582897028454 val/a2 0.7518478541027213 val/a3 0.8827132495915946 Epoch 19: 100%|███████████████████████▉| 3273/3274 [24:25<00:00, 2.23it/s, loss=0.0948, v_num=33, val_loss=0.373]val_loss 0.388417719098184 val/abs_diff 0.5430796397368953 val/abs_rel 0.388417719098184 val/a1 0.44228708869974376 val/a2 0.7364000470985269 val/a3 0.8776327044191495 Epoch 20: 100%|███████████████████████▉| 3273/3274 [24:37<00:00, 2.21it/s, loss=0.0955, v_num=33, val_loss=0.388]val_loss 0.3781192291685393 val/abs_diff 0.5315488791661643 val/abs_rel 0.3781192291685393 val/a1 0.4515975461529454 val/a2 0.7456193133279192 val/a3 0.8811393272429006 Epoch 21: 100%|███████████████████████▉| 3273/3274 [24:23<00:00, 2.24it/s, loss=0.0961, v_num=33, val_loss=0.378]val_loss 0.37973579075274894 val/abs_diff 0.5328741206879347 val/abs_rel 0.37973579075274894 val/a1 0.44863458509456383 val/a2 0.7457987694113467 val/a3 0.8808330554917385 Epoch 22: 100%|████████████████████████▉| 3273/3274 [24:32<00:00, 2.22it/s, loss=0.108, v_num=33, val_loss=0.380]val_loss 0.3715814605285304 val/abs_diff 0.5215684282877635 val/abs_rel 0.3715814605285304 val/a1 0.46044591251113604 val/a2 0.7557624499422844 val/a3 0.8848951065764181 Epoch 23: 100%|███████████████████████▉| 3273/3274 [24:23<00:00, 2.24it/s, loss=0.0997, v_num=33, val_loss=0.372]val_loss 0.3606194248466704 val/abs_diff 0.5079990155260328 val/abs_rel 0.3606194248466704 val/a1 0.47955402364613303 val/a2 0.764591505829717 val/a3 0.8870922036014253 Epoch 24: 100%|███████████████████████▉| 3273/3274 [24:22<00:00, 2.24it/s, loss=0.0782, v_num=33, val_loss=0.361]val_loss 0.37530080834185015 val/abs_diff 0.53074417983842 val/abs_rel 0.37530080834185015 val/a1 0.45020496953681044 val/a2 0.7460091985027555 val/a3 0.879656540815819 Epoch 25: 100%|███████████████████████▉| 3273/3274 [25:03<00:00, 2.18it/s, loss=0.0888, v_num=33, val_loss=0.375]val_loss 0.3829835808962723 val/abs_diff 0.5382425882656809 val/abs_rel 0.3829835808962723 val/a1 0.443165370765986 val/a2 0.7395624119631001 val/a3 0.8783617734629223 Epoch 26: 100%|███████████████████████▉| 3273/3274 [24:36<00:00, 2.22it/s, loss=0.0837, v_num=33, val_loss=0.383]val_loss 0.381393966697891 val/abs_diff 0.5369501236843671 val/abs_rel 0.381393966697891 val/a1 0.4464323749074913 val/a2 0.739623209120522 val/a3 0.8770313211170161 Epoch 27: 100%|███████████████████████▉| 3273/3274 [24:24<00:00, 2.24it/s, loss=0.0952, v_num=33, val_loss=0.381]val_loss 0.3675002611019242 val/abs_diff 0.5193339754333238 val/abs_rel 0.3675002611019242 val/a1 0.46400740466347323 val/a2 0.7563679257468998 val/a3 0.8830085393968322 Epoch 28: 100%|█████████████████████████| 3274/3274 [24:52<00:00, 2.19it/s, loss=0.105, v_num=33, val_loss=0.368]val_loss 0.39851251547115507 val/abs_diff 0.5545077231918143 val/abs_rel 0.39851251547115507 val/a1 0.4312775203949409 val/a2 0.7246398873312373 val/a3 0.8743955541944279 Epoch 29: 100%|███████████████████████▉| 3273/3274 [24:23<00:00, 2.24it/s, loss=0.0971, v_num=33, val_loss=0.399]val_loss 0.379246566462125 val/abs_diff 0.5329474487075223 val/abs_rel 0.379246566462125 val/a1 0.44892661081093577 val/a2 0.7435859131141448 val/a3 0.8796466749878556 Epoch 30: 100%|████████████████████████▉| 3273/3274 [24:36<00:00, 2.22it/s, loss=0.103, v_num=33, val_loss=0.379]val_loss 0.3734849472770668 val/abs_diff 0.5233722313748839 val/abs_rel 0.3734849472770668 val/a1 0.46056050558884937 val/a2 0.7507722611438501 val/a3 0.883642831458732 Epoch 31: 100%|███████████████████████▉| 3273/3274 [24:33<00:00, 2.22it/s, loss=0.0961, v_num=33, val_loss=0.373]val_loss 0.39918553376211807 val/abs_diff 0.5522576884274751 val/abs_rel 0.39918553376211807 val/a1 0.43584290132835996 val/a2 0.7253975143455004 val/a3 0.874930739542688 Epoch 32: 100%|████████████████████████▉| 3273/3274 [24:31<00:00, 2.22it/s, loss=0.118, v_num=33, val_loss=0.399]val_loss 0.3880781775314203 val/abs_diff 0.5435131893191539 val/abs_rel 0.3880781775314203 val/a1 0.4368901803320003 val/a2 0.7336169482676635 val/a3 0.8775140195385391 Epoch 33: 100%|███████████████████████▉| 3273/3274 [24:21<00:00, 2.24it/s, loss=0.0875, v_num=33, val_loss=0.388]val_loss 0.38515366083929237 val/abs_diff 0.5368239465453815 val/abs_rel 0.38515366083929237 val/a1 0.4462980718959665 val/a2 0.7406887321125174 val/a3 0.8813466203044837 Epoch 34: 100%|████████████████████████▉| 3273/3274 [24:46<00:00, 2.20it/s, loss=0.124, v_num=33, val_loss=0.385]val_loss 0.38438736185641353 val/abs_diff 0.5410328798625671 val/abs_rel 0.38438736185641353 val/a1 0.4379732316718415 val/a2 0.7365776150719101 val/a3 0.8777100358490653 Epoch 35: 100%|████████████████████████▉| 3273/3274 [24:30<00:00, 2.23it/s, loss=0.088, v_num=33, val_loss=0.384]val_loss 0.3914675456537327 val/abs_diff 0.5481232617382712 val/abs_rel 0.3914675456537327 val/a1 0.43125081030835566 val/a2 0.7304554420737593 val/a3 0.8752293599323487 Epoch 36: 100%|███████████████████████▉| 3273/3274 [25:03<00:00, 2.18it/s, loss=0.0948, v_num=33, val_loss=0.391]val_loss 0.39373155959615125 val/abs_diff 0.5514842373244639 val/abs_rel 0.39373155959615125 val/a1 0.4284965390330749 val/a2 0.725653072198232 val/a3 0.8736294549955449 Epoch 37: 100%|████████████████████████▉| 3273/3274 [24:33<00:00, 2.22it/s, loss=0.101, v_num=33, val_loss=0.394]val_loss 0.3873164584344262 val/abs_diff 0.5405710958678958 val/abs_rel 0.3873164584344262 val/a1 0.44350277597495646 val/a2 0.7360805222685908 val/a3 0.8778209417638644 Epoch 38: 100%|████████████████████████▉| 3273/3274 [24:25<00:00, 2.23it/s, loss=0.111, v_num=33, val_loss=0.387]val_loss 0.3781529624942043 val/abs_diff 0.5300889742766188 val/abs_rel 0.3781529624942043 val/a1 0.45272680840721713 val/a2 0.7451562850687985 val/a3 0.8815343434541998 Epoch 39: 100%|███████████████████████▉| 3273/3274 [24:29<00:00, 2.23it/s, loss=0.0927, v_num=33, val_loss=0.378]val_loss 0.3776978533802458 val/abs_diff 0.526599929891002 val/abs_rel 0.3776978533802458 val/a1 0.4602837035423713 val/a2 0.7479946798002216 val/a3 0.883074292974293 Epoch 40: 100%|███████████████████████▉| 3273/3274 [24:25<00:00, 2.23it/s, loss=0.0826, v_num=33, val_loss=0.378]val_loss 0.385090944772595 val/abs_diff 0.5431642878293431 val/abs_rel 0.385090944772595 val/a1 0.4366230027216701 val/a2 0.7345313323495534 val/a3 0.8751890662130616 Epoch 41: 100%|███████████████████████▉| 3273/3274 [24:32<00:00, 2.22it/s, loss=0.0914, v_num=33, val_loss=0.385]val_loss 0.3942719465165351 val/abs_diff 0.5477689247875707 val/abs_rel 0.3942719465165351 val/a1 0.4350689423615944 val/a2 0.7309926998727199 val/a3 0.8773289303264707 Epoch 42: 100%|████████████████████████▉| 3273/3274 [24:30<00:00, 2.23it/s, loss=0.102, v_num=33, val_loss=0.394]val_loss 0.38924536301356527 val/abs_diff 0.5428741996486982 val/abs_rel 0.38924536301356527 val/a1 0.4385410950217449 val/a2 0.7347778519834152 val/a3 0.8781546636926176 Epoch 43: 100%|███████████████████████▉| 3273/3274 [24:25<00:00, 2.23it/s, loss=0.0916, v_num=33, val_loss=0.389]val_loss 0.3913817027924766 val/abs_diff 0.5475860089063644 val/abs_rel 0.3913817027924766 val/a1 0.43407347126745843 val/a2 0.7318789227887499 val/a3 0.875541521629817 Epoch 44: 100%|███████████████████████▉| 3273/3274 [24:33<00:00, 2.22it/s, loss=0.0813, v_num=33, val_loss=0.391]val_loss 0.40142009862012146 val/abs_diff 0.562545551240724 val/abs_rel 0.40142009862012146 val/a1 0.4139906801633152 val/a2 0.7204577144882489 val/a3 0.8719082838492774 Epoch 45: 100%|████████████████████████▉| 3273/3274 [24:28<00:00, 2.23it/s, loss=0.093, v_num=33, val_loss=0.401]val_loss 0.37673353484258965 val/abs_diff 0.5296643799599348 val/abs_rel 0.37673353484258965 val/a1 0.45337583303031787 val/a2 0.745682569876523 val/a3 0.8803480772625113 Epoch 46: 100%|████████████████████████▉| 3273/3274 [24:31<00:00, 2.22it/s, loss=0.104, v_num=33, val_loss=0.377]val_loss 0.3941754935213098 val/abs_diff 0.5507536660617506 val/abs_rel 0.3941754935213098 val/a1 0.4313530304902036 val/a2 0.7266523207717098 val/a3 0.8746060358246727 Epoch 47: 100%|████████████████████████▉| 3273/3274 [24:29<00:00, 2.23it/s, loss=0.107, v_num=33, val_loss=0.394]val_loss 0.3831328393043207 val/abs_diff 0.539158231425733 val/abs_rel 0.3831328393043207 val/a1 0.44004447366430166 val/a2 0.737266888221105 val/a3 0.8778958296831785 Epoch 48: 100%|███████████████████████▉| 3273/3274 [24:32<00:00, 2.22it/s, loss=0.0913, v_num=33, val_loss=0.383]val_loss 0.39040741845615595 val/abs_diff 0.5474288450999999 val/abs_rel 0.39040741845615595 val/a1 0.43133784151972737 val/a2 0.7282100907513793 val/a3 0.8748385269597103 Epoch 49: 100%|████████████████████████▉| 3273/3274 [24:24<00:00, 2.23it/s, loss=0.097, v_num=33, val_loss=0.390]val_loss 0.3763153957369182 val/abs_diff 0.5297080642180824 val/abs_rel 0.3763153957369182 val/a1 0.4508191234838795 val/a2 0.7448900001989284 val/a3 0.8803316037139982 Epoch 50: 100%|████████████████████████| 3274/3274 [24:44<00:00, 2.20it/s, loss=0.0958, v_num=33, val_loss=0.376]val_loss 0.3763055424315269 val/abs_diff 0.5327091330548687 val/abs_rel 0.3763055424315269 val/a1 0.4462534080088978 val/a2 0.7444079391693286 val/a3 0.8788084100949373 Epoch 51: 100%|███████████████████████▉| 3273/3274 [24:29<00:00, 2.23it/s, loss=0.0763, v_num=33, val_loss=0.376]val_loss 0.38802317503682326 val/abs_diff 0.5452832289634736 val/abs_rel 0.38802317503682326 val/a1 0.4347348131694144 val/a2 0.731514432130845 val/a3 0.87595517310738 Epoch 52: 100%|███████████████████████▉| 3273/3274 [24:57<00:00, 2.19it/s, loss=0.0858, v_num=33, val_loss=0.388]val_loss 0.39606957476883425 val/abs_diff 0.554704971226728 val/abs_rel 0.39606957476883425 val/a1 0.4251598620988394 val/a2 0.7257501096093039 val/a3 0.873932358244775

So, is there any way to find what's the problem? Thank you:)

Be997398715 avatar Jan 20 '22 02:01 Be997398715

Can you open the tensorboard by typing "tensorboard --logdir=ckpts/"? Then you can see the visualization results.

JiawangBian avatar Jan 20 '22 22:01 JiawangBian

Yes, the image and loss like below: image image image image image image image image image image

Be997398715 avatar Jan 21 '22 02:01 Be997398715

@Be997398715 Have you ever evaluate the trajectory with the trianed model?

wanglong1008 avatar Jul 20 '22 05:07 wanglong1008

Hi, I trained the scv2 model by tum dataset, and I found that the loss could not converge to a low value after trained for 70 epochs,etc image Is there any skills in traning? And what should I pay attention to in training?

i got the same problem , have you solved it?

ZhiyiHe1997 avatar Aug 26 '22 06:08 ZhiyiHe1997

I tried several TUM sequences. I find that there are too many static-camera frames, and I think that it will be a significant issue. Will provide a solution soon.

JiawangBian avatar Aug 26 '22 06:08 JiawangBian