TGAN icon indicating copy to clipboard operation
TGAN copied to clipboard

Add support for GPU and multiGPU

Open ManuelAlvarezC opened this issue 6 years ago • 7 comments

We should support GPU and multiGPU training and sampling, allowing to:

  • Select the desired behavior with a parameter
  • Select the best available option in case none is provided

EDIT

The required changes affect two scopes of the project:

  • During installation time: When installing the package we need to make the user choose between tensorflow and tensorflow-gpu and install it before proceding with the install of TGAN.

    • In README.md, we need to add a new subsection in Requirements called Tensorflow, en explaining that they need to install Tensorflow on their own, selecting between the two, and that in case of doubt they sould install the non-gpu.

    • In setup.py, we need to remove the tensorflow dependence on install_requires.

  • During execution time: When creating the instance of TGANModel, we should check at the gpu argument and depending on its value use one trainer or the other.

    • In tgan.trainer.MultiGPUTrainer:

      • Change the init signature, so the arguments follow the same order that GANTrainer (nr_gpu at the end)
      • Change the nr_gpu for gpus, that will expect a list of integers.
      • Adapt the rest of the method to the new format of gpus (lines 117, 120, 136, 142, 143)
    • In tgan.model.TGANModel:

      • Add a method get_gpus on TGANModel with the following code:
          from tensorflow.python.client import device_lib
      
          def get_gpus():
              return [x.locality.bus_id for x in device_lib.list_local_devices() if x.device_type == 'GPU']       
      
      • In TGANModel.init change the name of the argument gpu to gpus and on line 627 to:
          self.gpus = gpus or self.get_gpus()
          if self.gpus:
              os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(list(map(str, self.gpus)))
      
      
      • Add a method get_trainer(self, input_queue) that will:

        • If self.gpus is longer than 1 return a new instance of MultiGPUTrainer, instantiated with self.gpus
        • In any other case, return a GANTrainer as in line 688
      • Replace line 688 with a call to the newly created get_trainer.

ManuelAlvarezC avatar May 06 '19 11:05 ManuelAlvarezC

Hi @ManuelAlvarezC, how is this going? I would like to train with TGAN, but that's kinda hard without GPU.

Baukebrenninkmeijer avatar May 20 '19 08:05 Baukebrenninkmeijer

Hi @Baukebrenninkmeijer,

We are currently planning the next steps on the project, so I can't give you an exact answer. However, we are open to contributions :slightly_smiling_face:, so if you are interested, we can discuss here the implementation details.

ManuelAlvarezC avatar May 22 '19 15:05 ManuelAlvarezC

Hi @ManuelAlvarezC,

This is my first project with tensorpack, so I'm still getting used to working with it.

However, the current code already select the GPU automatically and has the option to pass a gpu='/gpu:0' paramater or alike. I had not seen this when I commented on the issue, so I'm wondering how the issue relates to these options already being available. I also saw code for a multi-GPU setup, which is also mentioned here.

Baukebrenninkmeijer avatar May 23 '19 08:05 Baukebrenninkmeijer

Hi @Baukebrenninkmeijer

This is my first project with tensorpack, so I'm still getting used to working with it.

I'm sure this won't be a big issue.

However, the current code already select the GPU automatically and has the option to pass a gpu='/gpu:0' paramater or alike. I had not seen this when I commented on the issue, so I'm wondering how the issue relates to these options already being available. I also saw code for a multi-GPU setup, which is also mentioned here.

Indeed, most of the code is already there, what is missing is orchestrating everything. I have updated the original issue with the implementation details, so if you are interested, please let me know, so I can assign this issue to you. If you want to start working on it I recommend you take a look at our CONTRIBUTING guide.

ManuelAlvarezC avatar May 23 '19 21:05 ManuelAlvarezC

Hi Manuel,

is it possible to run TGAN on GPU? I am finding it is running slowly on my CPU.

If I uninstall tensorflow and TGAN and then install tensorflow gpu and reinstall TGAN, will TGAN then automatically train on my GPU? GPU train is already working for other neural nets on my machine,

thanks,

George

eddieHou avatar Aug 19 '19 21:08 eddieHou

@georgeMcMahon TGAN should be independent of your tensorflow installation. Tensorflow should automatically use the GPU if it is available, so make sure this is the case. Check if you have the correct NVIDIA drivers, Cuda toolkit, Cudnn and tensorflow and if tensorflow has access to the gpu.

What library are you using for your other neural nets? Also tensorflow? If so, there might be a version mismatch or something.

Hope this helps!

Baukebrenninkmeijer avatar Aug 20 '19 19:08 Baukebrenninkmeijer

Hi @georgeMcMahon,

is it possible to run TGAN on GPU?

Yes it is, as @Baukebrenninkmeijer mentions in the comment above this, TGAN is independent of your tensorflow installation.

If I uninstall tensorflow and TGAN and then install tensorflow gpu and reinstall TGAN, will TGAN then automatically train on my GPU?

Yes, I think that this should work. This issue is open not because TGAN not being able to work with tensorflow-gpu but because we want to make it work in the three scenarios (CPU, GPU and multiGPU) out of the box.

ManuelAlvarezC avatar Aug 21 '19 10:08 ManuelAlvarezC