Add utility for graceful Launchpad termination with external schedulers
Summary
Fixes #311
When using Acme distributed experiments with external schedulers like Ray Tune's ASHA scheduler, the scheduler may terminate trials early. However, the Launchpad processes spawned by the experiment are not automatically terminated, leaving orphan processes running.
Problem
As described in #311, when Ray Tune's ASHA scheduler terminates a trial, the mp.Process running the Launchpad program is killed, but the child processes spawned by Launchpad continue running as orphans. This happens because the termination signal is not forwarded to the Launchpad processes.
Solution
Added two new utilities to acme/utils/lp_utils.py:
1. LaunchpadProgramStopper (Context Manager)
A context manager that registers signal handlers for SIGTERM and SIGINT. When these signals are received, it calls lp.stop() to gracefully terminate all Launchpad processes.
2. launch_with_termination_handler() (Convenience Function)
A wrapper around lp.launch() that automatically uses the LaunchpadProgramStopper context manager.
Usage
from acme.utils import lp_utils
def train_function(config):
experiment = build_experiment_config(config)
program = experiments.make_distributed_experiment(
experiment=experiment, num_actors=1)
# Use the new utility instead of lp.launch()
lp_utils.launch_with_termination_handler(program)
tuner = tune.Tuner(
train_function,
tune_config=tune.TuneConfig(scheduler=ASHAScheduler(...)),
)
Or using the context manager directly:
with lp_utils.LaunchpadProgramStopper():
lp.launch(program, lp.LaunchType.LOCAL_MULTI_PROCESSING)
Testing
- Verified syntax is valid with
python3 -m py_compile - Follows the existing signal handling patterns used in
acme/utils/signals.py