neuralconvo
neuralconvo copied to clipboard
Multiple GPU support
Hi Friends,
Do we have the support to run training on multiple GPUs to save time? I am having a machine with 4 GPUs but looks like only one of the GPU is being utilised.
Also, when i am trying to train over complete dataset using the following command, it is taking around 9 hours per epoch. Is this time expected or am i doing something wrong here?
th train.lua --cuda --dataset 0 --hiddenSize 1000
Thanks
For your reference, I train for whole dataset about 5 hours per pooch in Nvidia GTX980.
Thanks @jamesweb1 for that. I am more or less on same timelines !
Do you think that using multiple GPUs would help us in bringing the training time down, considerably. Right now, with 50 epochs per training and 5-6 hours per epoch, its taking too much time (~300 hrs) making it difficult to try out and find the best parameters (batchSize,hidden layer size, dataset size) for the task. Do you have any recommendations here?
Yes, It takes a lot of time. So that I try to train the small subset only(perhaps dataset = 20000). In these experiments, I can obtain the better parameters, and then extend to the whole dataset. I'd like to train on multiple GPUs, but I don't have another resources now. :(
Could we not pool experimentation and report results somewhere, as to avoid double work? Create a todo with to be validated parameter settings and I will pick and contribute findings!
That would be great, did anybody start some statistics/benchmarks already?
I am also looking at adding multiple GPU support, has anyone had any progress yet?
@svenwoldt that's a great idea! However, we don't have a good metric for measuring the quality of the model yet. #38 adds a validation set, maybe adding a tests set would be the best way to do this?
RE multiple GPU support. I'm not sure how this could be done and I only have 1 GPU at my disposal. So I'd need help on this :)
Did anyone make any progress on this? I'm also looking for a multi-GPU solution.