style-transfer-pytorch icon indicating copy to clipboard operation
style-transfer-pytorch copied to clipboard

Multi-GPU allocation

Open jdumont0201 opened this issue 4 years ago • 0 comments

Hi,

I'm trying to understand device allocation here. I have different GPUs capacities, and the program stops with OOM during backward() even when free space is available.

In the code, I see two critical parts for GPU alloc:

  1. In class StyleTransfer, you create a device plan to spead the load of the 27ish layers of VGG over GPUs.
 if len(self.devices) == 1:
            device_plan = {0: self.devices[0]}
        elif len(self.devices) == 2:
            device_plan = {0: self.devices[0], 5: self.devices[1]}

meaning you send 5 first layers to GPU0 and all other to GPU1.

  1. In the stylize main loop, you actually send all images and styles to GPU0:
self.image = self.image.to(self.devices[0])
content = content.to(self.devices[0]) 

so I'm not sure whether the load is spread at all during the backward descent.

How would you see a version where the load is spread evenly depending of each capacity?

  • Would you send everything to GPU0 until almost loaded, and then send the remaining data to GPU1
  • Or would you keep the ratio of data to each GPU well-balanced during the whole process?

Regards, J

jdumont0201 avatar Oct 20 '21 15:10 jdumont0201