Styleformer icon indicating copy to clipboard operation
Styleformer copied to clipboard

How to do inferencing using multiple GPU's for styleformer

Open pratikchhapolika opened this issue 3 years ago • 6 comments

I am using this model to do inferencing on 1 million data point using A100 GPU's with 4 GPU. I am launching a inference.py code using Googles vertex-ai Container.

How can I make inference code to utilise all 4 GPU's ? So that inferencing is super-fast.

Here is the same code I use in inference.py:

from styleformer import Styleformer
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 1) 
import torch
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)

source_sentences = [
"I would love to meet attractive men in town",
"Please leave the room now",
"It is a delicious icecream",
"I am not paying this kind of money for that nonsense",
"He is on cocaine and he cannot be trusted with this",
"He is a very nice man and has a charming personality",
"Let us go out for dinner",
"We went to Barcelona for the weekend. We have a lot of things to tell you.",
]   

for source_sentence in source_sentences:
    # inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
    target_sentence = sf.transfer(source_sentence, inference_on=1, quality_filter=0.95, max_candidates=5)
    print("[Formal] ", source_sentence)
    if target_sentence is not None:
        print("[Casual] ",target_sentence)
    else:
        print("No good quality transfers available !")
    print("-" *100)     

pratikchhapolika avatar Apr 25 '22 06:04 pratikchhapolika

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.

import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)

You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1.

Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.

PrithivirajDamodaran avatar Apr 25 '22 12:04 PrithivirajDamodaran

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.

import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)

You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1.

Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.

So for 4 GPU's. It would be:

target_sentence = sf.transfer(source_sentence, inference_on=4, quality_filter=0.95, max_candidates=5)

pratikchhapolika avatar Apr 25 '22 14:04 pratikchhapolika

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.

@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

mhillebrand avatar Apr 25 '22 20:04 mhillebrand

Leverage the "inference_on" parameter, and updated it to make it more intuitive now for multi-GPU usage. -1 is reserved for CPU and ```0 through 998 is reserved for GPUs``. The following snippet will get you the max number of visible CUDA devices you have.

import torch

num_of_gpus = torch.cuda.device_count()
print(num_of_gpus)

You just have to pass the range(num_of_gpus) i.e 0 to <your_max_devices> in inference_on. Behind the scenes I will be using the cuda devices as cuda:0, cuda:1 etc. up to num_of_gpus - 1. Just write a function to wrap Styleformer inference with device index as one of the params and invoke it using simple python multiprocessing. The number process can be equal to the number of devices. Each process will run Styleformer inference with the respective device index say P0 will run on CUDA:0, P1 will run on CUDA:1 & so on. Internally handle how you want to store the inference results.

So for 4 GPU's. It would be:

target_sentence = sf.transfer(source_sentence, inference_on=4, quality_filter=0.95, max_candidates=5)

@PrithivirajDamodaran please confirm on this?

Will distributed training work here.

Like this:

python -m torch.distributed.launch --nproc_per_node 4 inference.py

pratikchhapolika avatar Apr 26 '22 04:04 pratikchhapolika

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing.

@PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

Yes, it can be batched. Will add that patch now.

PrithivirajDamodaran avatar Jun 30 '22 04:06 PrithivirajDamodaran

@pratikchhapolika It sounds like you'll need to fire up a separate process for each GPU and pass in inference_on=0, inference_on=1, inference_on=2, and inference_on=3, respectively, using multiprocessing. @PrithivirajDamodaran What I would like to know is how one can batchify Styleformer inference tasks to make efficient use of GPUs that have 48GB or 80GB each.

Yes, it can be batched. Will add that patch now.

@PrithivirajDamodaran How's the batch patch coming along?

mhillebrand avatar Sep 03 '22 22:09 mhillebrand