PyTorch-Encoding icon indicating copy to clipboard operation
PyTorch-Encoding copied to clipboard

some questions about train.py

Open dr18112004 opened this issue 5 years ago • 5 comments

Hi, thanks for your work. My environment is: ubuntu==18.04 python==3.6.8 pytorch==1.4.0 cuda==10.0 when I first run the train.py, there is a error about 'ninja'. I have resolved it and obtained a result. But when I change the model from 'encnet' to 'deeplab', the code doesn't react at all, and there are no errors. Then I reuse the 'encnet', the codedoesn't react at all, and there are no errors. How can I resolve it?

dr18112004 avatar Jun 11 '20 06:06 dr18112004

That sounds wired. These are the best practices for setting up the environment https://hangzhang.org/PyTorch-Encoding/notes/compile.html#detailed-steps

zhanghang1989 avatar Jun 11 '20 06:06 zhanghang1989

Thank you very much for your quick reply,I will try the practices. Thanks again

dr18112004 avatar Jun 11 '20 06:06 dr18112004

Hi, I have tried the practices following your provided, and loading the data are fine, but the code can't begin to train, do you have any suggestions to solve this problem. Thanks again.

the problem as follows: " Using poly LR scheduler with warm-up epochs of 0! Starting Epoch: 0 Total Epoches: 200 0%| | 0/1384 [00:00<?, ?it/s] =>Epoch 0, learning rate = 0.0001, previous best = 0.0000 " and the code can't begin to train.

dr18112004 avatar Jun 15 '20 04:06 dr18112004

That's wired. Have you tried train_dist.py?

zhanghang1989 avatar Jun 15 '20 04:06 zhanghang1989

I reduce the batch_size==16, and the code is able to train properly . Thanks again.

dr18112004 avatar Jun 15 '20 09:06 dr18112004