CREStereo nan

2022/06/01 14:17:17 Model params saved: train_logs/models/epoch-1.mge 2022/06/01 14:17:25 0.66 b/s,passed:00:13:16,eta:21:41:36,data_time:0.16,lr:0.0004,[2/100:5/500] ==> loss:26.19 2022/06/01 14:17:32 0.65 b/s,passed:00:13:24,eta:21:40:40,data_time:0.17,lr:0.0004,[2/100:10/500] ==> loss:6.847 2022/06/01 14:17:40 0.68 b/s,passed:00:13:31,eta:21:39:57,data_time:0.14,lr:0.0004,[2/100:15/500] ==> loss:6.83 2022/06/01 14:17:47 0.67 b/s,passed:00:13:39,eta:21:39:12,data_time:0.16,lr:0.0004,[2/100:20/500] ==> loss:16.89 2022/06/01 14:17:55 0.66 b/s,passed:00:13:46,eta:21:38:28,data_time:0.17,lr:0.0004,[2/100:25/500] ==> loss:43.18 2022/06/01 14:18:02 0.66 b/s,passed:00:13:54,eta:21:37:36,data_time:0.17,lr:0.0004,[2/100:30/500] ==> loss:20.37 2022/06/01 14:18:10 0.65 b/s,passed:00:14:01,eta:21:36:52,data_time:0.18,lr:0.0004,[2/100:35/500] ==> loss:15.24 2022/06/01 14:18:17 0.65 b/s,passed:00:14:09,eta:21:36:18,data_time:0.19,lr:0.0004,[2/100:40/500] ==> loss:9.399 2022/06/01 14:18:25 0.67 b/s,passed:00:14:16,eta:21:35:41,data_time:0.16,lr:0.0004,[2/100:45/500] ==> loss:40.27 2022/06/01 14:18:32 0.68 b/s,passed:00:14:24,eta:21:34:58,data_time:0.14,lr:0.0004,[2/100:50/500] ==> loss:15.02 2022/06/01 14:18:40 0.69 b/s,passed:00:14:31,eta:21:34:14,data_time:0.14,lr:0.0004,[2/100:55/500] ==> loss:32.48 2022/06/01 14:18:47 0.65 b/s,passed:00:14:39,eta:21:33:42,data_time:0.18,lr:0.0004,[2/100:60/500] ==> loss:9.96 2022/06/01 14:18:55 0.65 b/s,passed:00:14:46,eta:21:33:16,data_time:0.18,lr:0.0004,[2/100:65/500] ==> loss:14.69 2022/06/01 14:19:02 0.68 b/s,passed:00:14:54,eta:21:32:35,data_time:0.13,lr:0.0004,[2/100:70/500] ==> loss:nan 2022/06/01 14:19:10 0.65 b/s,passed:00:15:01,eta:21:31:55,data_time:0.19,lr:0.0004,[2/100:75/500] ==> loss:nan 2022/06/01 14:19:17 0.68 b/s,passed:00:15:09,eta:21:31:14,data_time:0.15,lr:0.0004,[2/100:80/500] ==> loss:nan 2022/06/01 14:19:25 0.67 b/s,passed:00:15:16,eta:21:30:34,data_time:0.15,lr:0.0004,[2/100:85/500] ==> loss:nan 2022/06/01 14:19:32 0.67 b/s,passed:00:15:24,eta:21:30:08,data_time:0.17,lr:0.0004,[2/100:90/500] ==> loss:nan 2022/06/01 14:19:40 0.69 b/s,passed:00:15:31,eta:21:29:28,data_time:0.14,lr:0.0004,[2/100:95/500] ==> loss:nan 2022/06/01 14:19:47 0.65 b/s,passed:00:15:39,eta:21:28:54,data_time:0.17,lr:0.0004,[2/100:100/500] ==> loss:nan 2022/06/01 14:19:55 0.68 b/s,passed:00:15:46,eta:21:28:11,data_time:0.14,lr:0.0004,[2/100:105/500] ==> loss:nan 2022/06/01 14:20:02 0.65 b/s,passed:00:15:54,eta:21:27:38,data_time:0.17,lr:0.0004,[2/100:110/500] ==> loss:nan 2022/06/01 14:20:10 0.64 b/s,passed:00:16:01,eta:21:27:04,data_time:0.2,lr:0.0004,[2/100:115/500] ==> loss:nan 2022/06/01 14:20:17 0.67 b/s,passed:00:16:09,eta:21:26:28,data_time:0.16,lr:0.0004,[2/100:120/500] ==> loss:nan 2022/06/01 14:20:25 0.66 b/s,passed:00:16:16,eta:21:26:04,data_time:0.17,lr:0.0004,[2/100:125/500] ==> loss:nan 2022/06/01 14:20:32 0.68 b/s,passed:00:16:24,eta:21:25:20,data_time:0.15,lr:0.0004,[2/100:130/500] ==> loss:nan

hello! this is my train logs,why?

Jun 01 '22 06:06 jim88481

Hi, have you solved this issue? I encountered this problem when training on Sceneflow.

Jul 22 '22 13:07 WenjiaR

Hi, have you solved this issue? I encountered this problem when training on Sceneflow.

yes，I solved it by https://github.com/ibaiGorordo/CREStereo-Pytorch and,there is not much difference in their effectiveness after 500 epoch

Jul 22 '22 14:07 jim88481

Hi, have you solved this issue? I encountered this problem when training on Sceneflow.

besides,I think it's the lack of memory that causes it NAN.You can try to solve this problem

Jul 22 '22 14:07 jim88481

Hi, have you solved this issue? I encountered this problem when training on Sceneflow.

yes，I solved it by https://github.com/ibaiGorordo/CREStereo-Pytorch and,there is not much difference in their effectiveness after 500 epoch

Thank you for your reply! I will try in this way.

Jul 22 '22 14:07 WenjiaR

Hi, have you solved this issue? I encountered this problem when training on Sceneflow.

besides,I think it's the lack of memory that causes it NAN.You can try to solve this problem

Were you able to reproduce their performance with the pytorch implementation? I tried that repo you mentioned, but I'm still suffering from Nan loss after some epochs. If you were able to reproduce, what datasets did you use? Please specify the sub-datasets such as "monkaa", and "clean" or "final" versions you used. Thanks!

Aug 23 '22 01:08 deephog

@deephog I used https://github.com/ibaiGorordo/CREStereo-Pytorch and solved NAN. I use the datasets of Baidu Web disk provided by the author (Download from BaiduCloud here(Extraction code: aa3g) and extract the tar files manually).This is the result after 200 epochs

Aug 23 '22 03:08 jim88481

@deephog I used https://github.com/ibaiGorordo/CREStereo-Pytorch and solved NAN. I use the datasets of Baidu Web disk provided by the author (Download from BaiduCloud here(Extraction code: aa3g) and extract the tar files manually).This is the result after 200 epochs

Did you compare the final results to the pre-trained model they provide? I can get similar results, but I can never get as good as theirs

Aug 23 '22 16:08 deephog

@deephog I used https://github.com/ibaiGorordo/CREStereo-Pytorch and solved NAN. I use the datasets of Baidu Web disk provided by the author (Download from BaiduCloud here(Extraction code: aa3g) and extract the tar files manually).This is the result after 200 epochs

Did you compare the final results to the pre-trained model they provide? I can get similar results, but I can never get as good as theirs

I'm sorry，This is what I did a few months ago，I only remember that after 500 epochs the results were more or less adequate for my needs.However, in general, the author's pre-trained model is the best, and it is normal that you cannot achieve the author's results.

Aug 23 '22 22:08 jim88481

CREStereo CREStereo copied to clipboard

nan

CREStereo
CREStereo copied to clipboard