Add support for GPU and multiGPU
We should support GPU and multiGPU training and sampling, allowing to:
- Select the desired behavior with a parameter
- Select the best available option in case none is provided
EDIT
The required changes affect two scopes of the project:
-
During installation time: When installing the package we need to make the user choose between
tensorflowandtensorflow-gpuand install it before proceding with the install ofTGAN.-
In
README.md, we need to add a new subsection in Requirements called Tensorflow, en explaining that they need to install Tensorflow on their own, selecting between the two, and that in case of doubt they sould install the non-gpu. -
In
setup.py, we need to remove thetensorflowdependence oninstall_requires.
-
-
During execution time: When creating the instance of TGANModel, we should check at the gpu argument and depending on its value use one trainer or the other.
-
In
tgan.trainer.MultiGPUTrainer:- Change the init signature, so the arguments follow the same order that
GANTrainer(nr_gpu at the end) - Change the
nr_gpuforgpus, that will expect a list of integers. - Adapt the rest of the method to the new format of
gpus(lines 117, 120, 136, 142, 143)
- Change the init signature, so the arguments follow the same order that
-
In
tgan.model.TGANModel:- Add a method
get_gpusonTGANModelwith the following code:
from tensorflow.python.client import device_lib def get_gpus(): return [x.locality.bus_id for x in device_lib.list_local_devices() if x.device_type == 'GPU']- In TGANModel.init change the name of the argument
gputogpusand on line 627 to:
self.gpus = gpus or self.get_gpus() if self.gpus: os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(list(map(str, self.gpus)))-
Add a method
get_trainer(self, input_queue)that will:- If
self.gpusis longer than 1 return a new instance ofMultiGPUTrainer, instantiated withself.gpus - In any other case, return a
GANTraineras in line 688
- If
-
Replace line 688 with a call to the newly created
get_trainer.
- Add a method
-
Hi @ManuelAlvarezC, how is this going? I would like to train with TGAN, but that's kinda hard without GPU.
Hi @Baukebrenninkmeijer,
We are currently planning the next steps on the project, so I can't give you an exact answer. However, we are open to contributions :slightly_smiling_face:, so if you are interested, we can discuss here the implementation details.
Hi @ManuelAlvarezC,
This is my first project with tensorpack, so I'm still getting used to working with it.
However, the current code already select the GPU automatically and has the option to pass a gpu='/gpu:0' paramater or alike. I had not seen this when I commented on the issue, so I'm wondering how the issue relates to these options already being available. I also saw code for a multi-GPU setup, which is also mentioned here.
Hi @Baukebrenninkmeijer
This is my first project with tensorpack, so I'm still getting used to working with it.
I'm sure this won't be a big issue.
However, the current code already select the GPU automatically and has the option to pass a gpu='/gpu:0' paramater or alike. I had not seen this when I commented on the issue, so I'm wondering how the issue relates to these options already being available. I also saw code for a multi-GPU setup, which is also mentioned here.
Indeed, most of the code is already there, what is missing is orchestrating everything. I have updated the original issue with the implementation details, so if you are interested, please let me know, so I can assign this issue to you. If you want to start working on it I recommend you take a look at our CONTRIBUTING guide.
Hi Manuel,
is it possible to run TGAN on GPU? I am finding it is running slowly on my CPU.
If I uninstall tensorflow and TGAN and then install tensorflow gpu and reinstall TGAN, will TGAN then automatically train on my GPU? GPU train is already working for other neural nets on my machine,
thanks,
George
@georgeMcMahon TGAN should be independent of your tensorflow installation. Tensorflow should automatically use the GPU if it is available, so make sure this is the case. Check if you have the correct NVIDIA drivers, Cuda toolkit, Cudnn and tensorflow and if tensorflow has access to the gpu.
What library are you using for your other neural nets? Also tensorflow? If so, there might be a version mismatch or something.
Hope this helps!
Hi @georgeMcMahon,
is it possible to run TGAN on GPU?
Yes it is, as @Baukebrenninkmeijer mentions in the comment above this, TGAN is independent of your tensorflow installation.
If I uninstall tensorflow and TGAN and then install tensorflow gpu and reinstall TGAN, will TGAN then automatically train on my GPU?
Yes, I think that this should work. This issue is open not because TGAN not being able to work with tensorflow-gpu but because we want to make it work in the three scenarios (CPU, GPU and multiGPU) out of the box.