Add profiler and optimize the parallelism for simulation

Open Ryan0v0 opened this issue 3 years ago • 1 comments

Reference Issues/PRs

Inspired by #780

What does this implement/fix? Explain your changes.

To keep track of VRAM usage by pinging for information every 0.7 seconds in a separate thread. Use the locking mechanism to guarantee the synchronisation of threads. Each client profiling thread is bound to the ID of a particular client and runs in parallel to the local training routine.

It can significantly reduces the wall-clock time by accurately assigning resources when simulating FL workloads.

Any other comments?

This profiling and optimization are summarized in the paper Protea: Client Profiling within Federated Systems using Flower. Feel free to check it out for more details.

Aug 01 '22 08:08 Ryan0v0

Thanks @danieljanes and @tanertopal for the comments and suggestions! I think I have resolved most of them except for the package vendoring.

Regarding the discussion about packages, the reason why I chose to use both GPUtil and psutil is that, they are "on the same level". The former is the library for retrieving information on running processes and system utilization on CPU, while the latter does the same thing but on GPU, which makes the code clear for readers. However, GPUtil cannot get metrics for one particular GPU. That’s why I imported the lower-level package nvsmi to get a specific pid.

Actually we didn't use the metrics we retrieved from psutil in our resource re-allocation. If we are going to remove GPUtil, should we still keep the psutil package and related codes to get CPU utilization percentage as well?

Looking forward to hearing your thoughts @tanertopal @danieljanes @pedropgusmao! Please let me know if there are any other questions you have/changed you would suggest.

Aug 24 '22 15:08 Ryan0v0