Inquiry on FBPIC Multi-GPU Parallel Acceleration Failure
We are currently simulating laser-wakefield acceleration with self-injection using FBPIC and have encountered an unexpected issue when attempting multi-GPU parallelization. Despite utilizing four NVIDIA Tesla V100 16GB GPUs, the computational time per simulation event remains identical to that of a single-GPU run (approximately 2 hours), indicating that we are not experiencing any acceleration benefits from the multi-GPU setup.
Following preliminary investigations, we have confirmed that the performance of the four GPUs has been validated. Consequently, we suspect that the issue may be related to the parallel configuration or the algorithmic logic within FBPIC. We would greatly appreciate your guidance on potential solutions.
Are there known scalability bottlenecks in FBPIC for parallel computing? Are there specific key parameters that require optimization in multi-GPU scenarios? How can we effectively achieve multi-GPU parallel acceleration?
Thanks for this question @kelly1122i.
Indeed it is entirely possible that FBPIC simulations may not see significant speed up from multi-GPU setups, because of the overheads involved in the communicating fields and particles between the different GPUs. In practice, whether or not you will see a speedup mostly depends on the size of the simulation (larger simulations are more likely to see a speedup). This explained in more details on this page of the documentation:
https://fbpic.github.io/overview/parallelisation.html
and in particular that part:
One thing that you could try is to activate GPU-aware MPI, as described here: https://fbpic.github.io/how_to_run.html