FLAG icon indicating copy to clipboard operation
FLAG copied to clipboard

Multi-gpu version

Open mengliu1998 opened this issue 3 years ago • 4 comments

Dear authors,

Thank you for this awesome work. I found the provided examples are very clear. However, I found it non-trivial to implement FLAG to a multi-GPU training for data parallelism using DataParallel from PyG. Do you have any idea to implement FLAG on such a multi-gpu training pipeline.

Thank you.

mengliu1998 avatar Aug 11 '21 22:08 mengliu1998

Hi Meng,

Thanks for your interest in our work and using our code. Yes using FLAG on multi-GPU is definitely interesting and has always been a thing to me. Unfortunately, I haven't tried really implementing one, as the experiments I did on OGB do not require that, but I do think that's a future direction to pursue.

And it seems that you have got your hands dirty on this problem, do you have any idea to share? Do you know what is blocking us from using the parallel lib provided by PyG?

Thanks!

devnkong avatar Aug 12 '21 17:08 devnkong

Hi @devnkong,

Thank you for your quick response.

The DataParallel from PyG should be used with DataListLoader. In this case, each iteration of the dataloader will return a list of torch_geometric.data.Data, which will be split to multiple GPUs and on each GPU it will be organized as an torch_geometric.data.Batch object.

So the update of the perturbation for a batch (on multiple GPUs) seems to be difficult.

mengliu1998 avatar Aug 12 '21 18:08 mengliu1998

I see, my suggestion is that you directly try on the PyG multi-gpu example in the straight-forward manner. For adversarial training on CV and NLP, similar implementation like FLAG works fine on multi-gpu, so I guess it shouldn't be a big issue on graph either? If you have already tried please tell, thx!

devnkong avatar Aug 12 '21 18:08 devnkong

Thanks. I will try it.

mengliu1998 avatar Aug 12 '21 19:08 mengliu1998