ml-agents multiprocessing for inference and training, to make full use of CPU and GPU.

We are devoloping a basketball game, in which the policies(or agents, brains) are almost fixed in a game, say, 3 vs 3. In order to speedup the training process, we set num-envs=20(or more), but it doesn't speedup too much. It seems there's bottleneck at main-process, and we noticed that the CPU(i9) and GPU(RTX3090) usage are very low:

1 CPU-Core have reached 100%, others are almost 0%;
GPU's Usage is 10%;

So How to make full use of CPU and GPU to speedup the training process?

Solution I'd like I have read the source code and found that the main process took too much work. all the tasks are handled sequentially. Say, at SubprocessEnvManager:

    def _queue_steps(self) -> None:
        # We iterate env_workers sequentially!!!
        # Is it possible to merge those data from different envs to one batch???
        for env_worker in self.env_workers:
            if not env_worker.waiting:
                env_action_info = self._take_step(env_worker.previous_step)
                ...

    def _take_step(self, last_step: EnvironmentStep) -> Dict[BehaviorName, ActionInfo]:
        all_action_info: Dict[str, ActionInfo] = {}
        # We iterate policies sequentially too!!!
        # Is it possible to do this parallelly at different process???
        for brain_name, step_tuple in last_step.current_all_step_result.items():
            if brain_name in self.policies:
                all_action_info[brain_name] = self.policies[brain_name].get_action(
                    step_tuple[0], last_step.worker_id
                )
        return all_action_info

As in my case: the policies(or agents, brains) are created at the beginning of the game, no adding or removing during the training process. So I wonder is it possible to merge those data from different envs? and is it possible to make inference and training tasks handled parallelly using multiprocessing?

Any help will be appreciated. Thanks in advance.

Apr 12 '22 13:04 Tyushang

Hi, We have this under our radar to improve the efficiency of our trainers. We also had prototypes for parallelizing the SubprocessEnvManager. Thanks for bringing this to our attention, will prioritize releasing these changes when appropriate.

Apr 20 '22 16:04 maryamhonari

Any progress?

Aug 12 '22 06:08 Tyushang

Very much looking forward to this. I'm using a 3960 Threadripper and a RTX 4090 and getting horrible performance. GPU sits at around 3-5% and most of the CPU idles with a single thread (out of 48!) at 100%. So my training is bottlenecked by the single-core-performance of my CPU, as a result I'm currently forced to do training on my MacBook.

So - anything on the horizon?

Mar 23 '23 16:03 saltandpurple

Very much looking forward to this. I'm using a 3960 Threadripper and a RTX 4090 and getting horrible performance. GPU sits at around 3-5% and most of the CPU idles with a single thread (out of 48!) at 100%. So my training is bottlenecked by the single-core-performance of my CPU, as a result I'm currently forced to do training on my MacBook.

So - anything on the horizon?

Hey! Can you tell me how you set up ml-agents to work on a 4090? Currently my PyTorch versions are complaining that 1.11 is incompatible with a 4090 while 2.0 is too high for ml-agents. I'm unsure what the workaround to my problem is and want to understand how you made it to work.

Thanks!

Sep 07 '23 00:09 AIWithShrey

Been a while since I set it up, but if I remember correctly I edited the requirements in the setup.py files. There are acceptable version ranges specified in there which you can change (risking a dysfunctional application afterwards, of course).

Sep 07 '23 11:09 saltandpurple

ml-agents ml-agents copied to clipboard

multiprocessing for inference and training, to make full use of CPU and GPU.

ml-agents
ml-agents copied to clipboard