psa icon indicating copy to clipboard operation
psa copied to clipboard

Learning rate

Open DQDH opened this issue 6 years ago • 9 comments

I don't know how to get the parameters of the Ours-ResNet segmentation network. Can you give a explain for the parameters ?Thanks.

DQDH avatar Oct 21 '18 14:10 DQDH

I tried change the learning rate to 0.01,and the batchsize 4,the loss is decreased to 0.0403,only within one epoch(Iter:37000/39675,a epoch almost finised but failed),but the program often cause a error like this: validating ... terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

So,is there any parameters need to change?or any advice on the error?

LeiyuanMa avatar Oct 22 '18 00:10 LeiyuanMa

I tried change the learing rate to 0.01,and batchsize=4,and due to the limit of GPU resource,I set model = torch.nn.DataParallel(model,device_ids=[0]),but after there is alwayes a error like this: Iter:36900/39675 Loss:0.0413 imps:3.5 Fin:Mon Oct 22 03:56:00 2018 lr: 0.0009 Iter:36950/39675 Loss:0.0363 imps:3.5 Fin:Mon Oct 22 03:55:56 2018 lr: 0.0009 Iter:37000/39675 Loss:0.0403 imps:3.5 Fin:Mon Oct 22 03:55:53 2018 lr: 0.0009

validating ... terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

the program can't finish a epoch,but the loss is decreased to 0.0403,is it accepted or need more epoch? Do you have any advise on this error?

LeiyuanMa avatar Oct 22 '18 00:10 LeiyuanMa

@hardBird123, I trained by Adam setting initial learning rate as 0.001. But I didn't try to find the optimal learning rate. You can get better results than mine by adopting SGD or just following the method described in https://arxiv.org/pdf/1611.10080.pdf.

jiwoon-ahn avatar Oct 23 '18 09:10 jiwoon-ahn

@LeiyuanMa, Sorry, I can't help you with that error. Probably related to the memory leak. In my case, training epochs do not change the performance a lot. And I haven't tested training 15 epochs is the best for the network.

jiwoon-ahn avatar Oct 23 '18 09:10 jiwoon-ahn

ok, thanks. I want to confirm that the weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params] is the pretrained weights for training segmentation network ResNet38?

DQDH avatar Oct 23 '18 09:10 DQDH

@hardBird123, Yes, that is the right file for the segmentation network.

jiwoon-ahn avatar Oct 23 '18 09:10 jiwoon-ahn

thanks,so is the loss=0.0403 acceptable?

LeiyuanMa avatar Oct 23 '18 09:10 LeiyuanMa

which lr_type(fixed(default)/step/linear) should I choose when training the ResNet38 segmentation network?

DQDH avatar Oct 30 '18 13:10 DQDH

hello, I'm a student who running this code. And there is a running error. Can you give me some tips about this issue. 2019-02-24 204758

RuntimeError: size mismatch, m1: [1 x 20], m2: [1 x 20]

suoranxiu avatar Feb 28 '19 08:02 suoranxiu