keras
keras copied to clipboard
Multiple GPU issue
Hi,
My model can run on a single GPU, but it failed on multiple GPU. Here is my code:
x_train, y_train = batch_reader.get_batch() gpu_list = ["gpu(0)", "gpu(1)", "gpu(2)", "gpu(3)"] model_dist.compile(loss=losses.dist_loss_cls(C.max_radius), optimizer=optimizer, context=gpu_list) model_dist.fit(x_train, y_train, batch_size=20, nb_epoch = num_epochs, callbacks=[checkpoint_fixed_name])
The error I got was:
RuntimeError: simple_bind error. Arguments: input_1: (5, 1L, 32L, 32L, 32L) [13:36:31] src/storage/storage.cc:59: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid device ordinal
Would anyone please help me? Thanks.
Issue seems to be that you don't have that many GPUs. May be you could run - "nvidia-smi" command on terminal and report if you have 4 GPUs?
I do have 4 GPUs.
I tried Resnet50 example here - https://github.com/dmlc/keras/blob/master/examples/cifar10_resnet50.py with multiple GPUs and things seems to work fine. Can you please let me know more details on the setup you have, version of MXNet, any CUDA specific environment variables set, code you are using.