theano_alexnet
theano_alexnet copied to clipboard
Execute alexnet using ligbpuarray backend
I am trying to execute alexnet using new libgpuarray backend for 1 gpu. The modifications that I have done to the 1 gpu sample are as in - 1gpu_libgpuarray_patch.txt
However, with these changes I get following error -
ValueError: ('The following error happened while compiling the node', DnnVersion(), '\n', 'context name None is already defined')
Complete error log - 1gpu_libgpuarray_error.txt
Further updating train.py
to use
theano.gpuarray.use("cuda")
instead of theano.gpuarray.use(config['gpu'])
then it starts training. But I don't think that this is correct. Please advise.
@deepali-c
The changes for making the single GPU train.py working would involve changing any sandbox.cuda
functions to gpuarray
alternatives, using device='cuda0' instead of device='gpu0', and moving any import theano
after setting up device context. The device context can be set up like this:
https://github.com/uoguelph-mlrg/Theano-MPI/blob/master/theanompi/models/test_model.py#L11
The parallel loading part is also different but still based on socket and ipc handle. See
https://github.com/uoguelph-mlrg/Theano-MPI/blob/master/theanompi/models/data/proc_load_mpi.py#L126
The two GPU version train_2gpu.py will require replacing the pycuda.device_d2d() and the summation function with the pygpu.collectives.allreduce() function.
As the method used in theano_alexnet has a strong dependency on the CudaNdarray Backend of Theano version < 0.9 and the pycuda library, we'd better make a new branch for trying the new GPUArray Backend. However, this will just redo some parts of Theano-MPI.
I have tried below mentioned two approaches, for using new backend with 1 gpu
:
- Modified
train.py
to setup device context as inTheano-MPI
: (Along with changes to usegpuarray
alternatives)
import os
os.environ['THEANO_FLAGS'] = 'device={0}'.format(config['gpu'])
import theano.gpuarray
# This is a bit of black magic that may stop working in future
# theano releases
ctx = theano.gpuarray.type.get_context(None)
This gives the following error:
THEANO_FLAGS=device=cuda0,mode=FAST_RUN,floatX=float32 python train.py
....
...#more output here
.....
`TypeError: Cannot convert Type TensorType(float64, 4D) (of Variable HostFromGpu(gpuarray).0) into Type TensorType(float32, 4D). You can try to manually convert HostFromGpu(gpuarray).0 into a TensorType(float32, 4D).`
- Updated
train.py
according to the patch I have shared in this thread previously then it works fine.
The difference is that in #1 I am trying to setup device context using the new method, instead of the pycuda gpu
setup. Looks like I have missed something while doing so, please advise.
@deepali-c
The error looks like something with floatX
.
Anyways, I just created a pygpu branch. And the single GPU train.py
is working. You can compare your patch with this commit to see what are the necessary changes.
There's some dependency difference. To use this branch, I recommend upgrading to the bleeding edge libgpuarray/pygpu and theano. I just tried on them and it's working.
$ backend=gpuarray python train.py
Using cuDNN version 5110 on context None
Mapped name None to device cuda0: GeForce GTX TITAN Black (0000:83:00.0)
... building the model
conv (cudnn) layer with shape_in: (3, 227, 227, 256)
conv (cudnn) layer with shape_in: (96, 27, 27, 256)
conv (cudnn) layer with shape_in: (256, 13, 13, 256)
conv (cudnn) layer with shape_in: (384, 13, 13, 256)
conv (cudnn) layer with shape_in: (384, 13, 13, 256)
fc layer with num_in: 9216 num_out: 4096
dropout layer with P_drop: 0.5
fc layer with num_in: 4096 num_out: 4096
dropout layer with P_drop: 0.5
softmax layer with num_in: 4096 num_out: 1000
... training
shared_x information received
img_mean received
training @ iter = 0
training cost: 6.91343069077
training error rate: 1.0
time per 20 iter: 28.7199730873
@hma02 , Thank you so much for these changes. The 1 gpu
example works with libgpuarray
on my setup as well.
I am working on the 2 gpu
sample next.
@deepali-c
I just made the train_2gpu.py
working based on pygpu collectives, which is then based on NCCL. So you need to install NCCL, libgpuarray and its wrapper pygpu in order to run this.
Thanks @hma02 .
I observed the following error while executing the 2 gpu sample
with gpuarray
backend:
Process Process-2:
Process Process-1:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "train_2gpu.py", line 356, in train_net
self._target(*self._args, **self._kwargs)
File "train_2gpu.py", line 356, in train_net
gpu_send_queue.put(this_val_error)
gpu_send_queue.put(this_val_error)
UnboundLocalError: local variable 'gpu_send_queue' referenced before assignment
UnboundLocalError: local variable 'gpu_send_queue' referenced before assignment
I made the following change in the code - train_2gpu.py
and then it could proceed without the above error.
- gpu_send_queue.put(this_val_error)
- that_val_error = gpu_recv_queue.get()
- this_val_error = (this_val_error + that_val_error) / 2.
-
- gpu_send_queue.put(this_val_loss)
- that_val_loss = gpu_recv_queue.get()
- this_val_loss = (this_val_loss + that_val_loss) / 2.
+ if os.environ['backend']=='gpuarray':
+ exch.exchange()
+ else:
+ gpu_send_queue.put(this_val_error)
+ that_val_error = gpu_recv_queue.get()
+ this_val_error = (this_val_error + that_val_error) / 2.
+
+ gpu_send_queue.put(this_val_loss)
+ that_val_loss = gpu_recv_queue.get()
+ this_val_loss = (this_val_loss + that_val_loss) / 2.
@deepali-c
Sorry, I forgot to debug the validation part. See the last commit regarding this issue.
The exch.exchange()
is for exchanging the total_params
. Here what we need is to average the validation error and cost over two workers, which is similar but exch
is an instance already bound to those total_params
so it won't help on doing this.
Thanks @hma02.
I got it now.