Ordered-Neurons icon indicating copy to clipboard operation
Ordered-Neurons copied to clipboard

How to train using main.py using multiple GPUs?

Open alvations opened this issue 5 years ago • 3 comments

@yikangshen @shawntan Is there an easy way to train the model to replicate the experiments using main.py using multiple GPUs?

When using model = nn.DataParallel(model) before train(), the initialization goes into the LSTM stack and then the ONLSTM cell to return the weights but it throws an error.

We also tried doing the model = nn.DataParallel(model) after the hidden = model.init_hidden(args.batch_size) and it seems like the LinearDropConnect layer can't access the .weight tensors.

alvations avatar May 28 '19 10:05 alvations

We didn't try to train the model with multiple GPUs. Maybe you need to rewrite the code for LinearDropConnect function

yikangshen avatar Jun 12 '19 01:06 yikangshen

Another question, it seems no speed up using GPU compared with CPU, have you met the same problem? Both take 280-290 s each epoch

BuaaAlban avatar Jun 26 '19 03:06 BuaaAlban

@yikangshen @shawntan Is there an easy way to train the model to replicate the experiments using main.py using multiple GPUs?

When using model = nn.DataParallel(model) before train(), the initialization goes into the LSTM stack and then the ONLSTM cell to return the weights but it throws an error.

We also tried doing the model = nn.DataParallel(model) after the hidden = model.init_hidden(args.batch_size) and it seems like the LinearDropConnect layer can't access the .weight tensors.

Hi, Just want to know have you figured this out? Best

Shiweiliuiiiiiii avatar Sep 24 '19 18:09 Shiweiliuiiiiiii