crnn.pytorch icon indicating copy to clipboard operation
crnn.pytorch copied to clipboard

Loss descent sppend differs greatly when using multi GPU and Single GPU with same learning rate

Open Alex220284 opened this issue 7 years ago • 5 comments

Alex220284 avatar Oct 25 '18 08:10 Alex220284

when I use 4 GPUs to train the model with batchsize 256 and learning rate 0.001, loss descent speed is very slow, but when using just single GPU with batchsize 64 and learning rate 0.001, it seems to converge very fast.

Alex220284 avatar Oct 25 '18 08:10 Alex220284

when I use 4 GPUs to train the model with batchsize 256 and learning rate 0.001, loss descent speed is very slow, but when using just single GPU with batchsize 64 and learning rate 0.001, it seems to converge very fast.

Can you train the model on multi GPU ??? can you tell me how? @Alex220284

Fighting-JJ avatar Mar 08 '19 08:03 Fighting-JJ

the loss is normalizd by sample number, you should multiple lr by 4 to get the same coverage speed when you set batch size 256. refer this paper for detail https://arxiv.org/abs/1706.02677

vc384 avatar Mar 19 '19 08:03 vc384

when i use Muti GPU,test fuction val raise assert t.numel() == length.sum(), "texts with length: {} does not match declared length: {}".format(t.numel(), length.sum()) AssertionError: texts with length: 19328 does not match declared length: 38656

preds = crnn(image) image size is 128 ,but preds is 64,can you tell why? Thank you very much!

cqray1990 avatar Aug 05 '19 08:08 cqray1990

when i use Muti GPU,test fuction val raise assert t.numel() == length.sum(), "texts with length: {} does not match declared length: {}".format(t.numel(), length.sum()) AssertionError: texts with length: 19328 does not match declared length: 38656

preds = crnn(image) image size is 128 ,but preds is 64,can you tell why? Thank you very much!

Did you solve this problem? Thanks

DuckJ avatar Sep 29 '19 08:09 DuckJ