keras icon indicating copy to clipboard operation
keras copied to clipboard

Multiple GPU issue

Open hnhuang opened this issue 7 years ago • 3 comments

Hi,

My model can run on a single GPU, but it failed on multiple GPU. Here is my code:

x_train, y_train = batch_reader.get_batch() gpu_list = ["gpu(0)", "gpu(1)", "gpu(2)", "gpu(3)"] model_dist.compile(loss=losses.dist_loss_cls(C.max_radius), optimizer=optimizer, context=gpu_list) model_dist.fit(x_train, y_train, batch_size=20, nb_epoch = num_epochs, callbacks=[checkpoint_fixed_name])

The error I got was:

RuntimeError: simple_bind error. Arguments: input_1: (5, 1L, 32L, 32L, 32L) [13:36:31] src/storage/storage.cc:59: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid device ordinal

Would anyone please help me? Thanks.

hnhuang avatar Aug 25 '17 18:08 hnhuang

Issue seems to be that you don't have that many GPUs. May be you could run - "nvidia-smi" command on terminal and report if you have 4 GPUs?

sandeep-krishnamurthy avatar Aug 25 '17 18:08 sandeep-krishnamurthy

I do have 4 GPUs.

hnhuang avatar Aug 25 '17 19:08 hnhuang

I tried Resnet50 example here - https://github.com/dmlc/keras/blob/master/examples/cifar10_resnet50.py with multiple GPUs and things seems to work fine. Can you please let me know more details on the setup you have, version of MXNet, any CUDA specific environment variables set, code you are using.

sandeep-krishnamurthy avatar Aug 28 '17 03:08 sandeep-krishnamurthy