PyTorch-Encoding some questions about train.py

some questions about train.py

Open dr18112004 opened this issue 5 years ago • 5 comments

Hi, thanks for your work. My environment is: ubuntu==18.04 python==3.6.8 pytorch==1.4.0 cuda==10.0 when I first run the train.py, there is a error about 'ninja'. I have resolved it and obtained a result. But when I change the model from 'encnet' to 'deeplab', the code doesn't react at all, and there are no errors. Then I reuse the 'encnet', the codedoesn't react at all, and there are no errors. How can I resolve it?

Jun 11 '20 06:06 dr18112004

That sounds wired. These are the best practices for setting up the environment https://hangzhang.org/PyTorch-Encoding/notes/compile.html#detailed-steps

Jun 11 '20 06:06 zhanghang1989

Thank you very much for your quick reply，I will try the practices. Thanks again

Jun 11 '20 06:06 dr18112004

Hi, I have tried the practices following your provided, and loading the data are fine, but the code can't begin to train, do you have any suggestions to solve this problem. Thanks again.

the problem as follows: " Using poly LR scheduler with warm-up epochs of 0! Starting Epoch: 0 Total Epoches: 200 0%| | 0/1384 [00:00<?, ?it/s] =>Epoch 0, learning rate = 0.0001, previous best = 0.0000 " and the code can't begin to train.

Jun 15 '20 04:06 dr18112004

That's wired. Have you tried train_dist.py?

Jun 15 '20 04:06 zhanghang1989

I reduce the batch_size==16, and the code is able to train properly . Thanks again.

Jun 15 '20 09:06 dr18112004

PyTorch-Encoding PyTorch-Encoding copied to clipboard

some questions about train.py

PyTorch-Encoding
PyTorch-Encoding copied to clipboard