RecBole icon indicating copy to clipboard operation
RecBole copied to clipboard

Implementing multi-GPUs Training for RecBole

Open juyongjiang opened this issue 4 years ago • 3 comments

Hi there,

I am a fan of the RecBole framework. Considering the complexity of the RecBole framework, I provide an easy but feasible method to achieve multi-GPUs training. The core implementation idea is that re-wrapping the internal data type of Interaction to the PyTorch Dataloader object. The more details in my pull request branch "fix_multi_gpus", please check it.

Note that it is just one of the promising ways to realize multi-GPUs training. Hoping this method can inspire you to come up with a more novelty method to do it.

To use multi-gpus training model (e.g. BERT4Rec), you just need to run the following command:

  1. Set the multi_gpus: True in your config.yaml files.
  2. $ python -m torch.distributed.launch --nproc_per_node=3 run_recbole.py --model=BERT4Rec --config_files recbole/properties/model/BERT4Rec.yaml

Best Regards, John

juyongjiang avatar Sep 10 '21 11:09 juyongjiang

@juyongjiang Hi, thanks for your PR and we will carefully check it.

2017pxy avatar Sep 10 '21 12:09 2017pxy

@juyongjiang Cool!

hunkim avatar Sep 10 '21 12:09 hunkim

Hello, I used your methods to implement multi-gpus on kagt, but after setting multi-gpus: True, this parameter doesn't seem to work as it isn't printed on the log. Is there any other setting that I have missed?

KlaineWei avatar Sep 17 '22 07:09 KlaineWei