vivarium-core icon indicating copy to clipboard operation
vivarium-core copied to clipboard

Proposal: Improved Handling of Parallel Processes

Open U8NWXD opened this issue 2 years ago • 3 comments

Current Approach

Message Passing

When a process has Process.parallel set to true, we launch a parallel OS process that runs vivarium.core.process._run_update(). run_update() contains a loop that repeatedly receives (interval, state) tuples from a pipe, passes those arguments to Process.next_update(), and sends the result back through the pipe. To stop the parallel process, we pass interval=-1.

Tracking in Main OS Process

In the main OS process (which contains Engine), we store ParallelProcess instances in Engine.parallel. Then when we need an update from a process, we call Engine._invoke_process, which does the following:

  • If the process is parallel, starts the ParallelProcess computing the update and returns it.
  • If the process is non-parallel, creates and returns an instance of InvokeProcess, which has an interface similar to ParallelProcess but computes the update immediately. Note that there's an extra invoke_process function that computes the process's update, but this extra level of indirection appears unnecessary.

Problems

The way we currently track parallel processes has a number of downsides:

  1. We store ParallelProcess instances in Engine.parallel, but we also store the original Process instances in the store hierarchy (with a reference in self.processes).
    1. This is unnecessary and wastes memory, especially for processes with large parameter dictionaries or internal states.
    2. Having the original Process instances in the store hierarchy is confusing. A user can read out the internal state of those processes with no problem, but they're getting the state from a process that hasn't changed since the beginning of the simulation, which is not intuitive.
  2. There's no way to retrieve any information from a process besides its next update. This is a problem when we want to read out a process's internal state (e.g. for debugging or when working with inner simulations).
  3. The extra levels of indirection are confusing. Every time I work on this, I have to trace through the code again to remind myself what all the "invoke" things do.

Proposed Solution

  1. Eliminate the extra invoke_process function that doesn't appear to do anything.

  2. Instead of storing ParallelProcess instances in self.parallel, put them directly into the store hierarchy with references in self.processes. Once a parallel process has been put on its own OS process, there should be no copies of it left in the main OS process.

  3. Systematize message-passing between the main and parallel OS processes as commands with arguments. Process will have a run_command() method that accepts a command name (string) and a tuple of arguments. The _run_update() function would handle the following commands:

    1. _halt: Ignores arguments and shuts down the parallel process (equivalent of passing interval=-1 currently)
    2. _next_update: Takes (interval, state) as arguments and passes them to Process.next_update()

    Process authors can override run_command() to handle more commands, e.g. to return internal state variables.

    ParallelProcess will also provide run_command(), but instead of running commands itself, it will pass those commands through the pipe to its child process's _run_update() function, which will in turn pass them to Process.run_command().

    I've started implementing this in #198

I think this proposal addresses all the problems described above. However, it brings some new downsides:

  1. Processes in the store hierarchy will not all be Process instances anymore. Some will be ParallelProcess instances. We don't want to make ParallelProcess inherit from Process because it doesn't know enough to do things like generate composites, and adding those missing capabilities is overkill.
  2. This would change the behavior of public functions. I think this is really more of a bug fix than a breaking change since the way we currently store Process instances in the store hierarchy is very counter-intuitive. The biggest breaking change is probably removing Engine.parallel, but I doubt anyone relies on that.

U8NWXD avatar May 27 '22 00:05 U8NWXD