MLServer icon indicating copy to clipboard operation
MLServer copied to clipboard

Models with the same inference_pool_gid still create a new InferencePool and spawn N parallel workers

Open dinispeixoto opened this issue 5 months ago • 0 comments

Description

The PR #2040 introduced a great feature allowing users to define custom inference pools per model, instead of sharing a single pool across models with different loads.

However, there's a small bug in mlserver/parallel/registry.py when the environment is not provided or an environment tarball is available:

if not env_tarball:
    return (
        self._pools.setdefault(
            inference_pool_gid,
            InferencePool(self._settings, on_worker_stop=self._on_worker_stop),
        )
        if inference_pool_gid
        else self._default_pool
    )

If inference_pool_gid already exists in self._pools, a new InferencePool instance is still created (and thus spawns N new worker processes) before setdefault checks for an existing key.

From the InferencePool constructor:

def __init__(
    self,
    settings: Settings,
    env: Optional[Environment] = None,
    on_worker_stop: List[InferencePoolHook] = [],
):
    configure_inference_pool(settings)

    ...
    for _ in range(self._settings.parallel_workers): # spawning Python processes
        worker = _spawn_worker(self._settings, self._responses, self._env)
        self._workers[worker.pid] = worker  # type: ignore

This leads to redundant process creation even when the pool already exists.

Steps to reproduce

  1. Create 2 ML models with the same inference_pool_gid, e.g. in model-settings.json
{
    "name": "foo",
    "implementation": "...",
    "parameters": {
        "inference_pool_gid": "bar"
    }
}

  1. Set the parallel_workers in settings.json to 2
{
    "debug": "true",
    "use_structured_logging": "true",
    "parallel_workers": 2,
}
  1. Start MLServer and check the number of worker processes:
ps -ef | grep spawn_main | grep python | wc -l

Expected: 4 processes (2 for the default pool + 2 for the custom bar pool) Observed: 6 processes, as InferencePool is instantiated twice.

This can also be demonstrated in a Python shell:

>>> class Foo:
...     def __init__(self):
...             print("hello")
... 
>>> bar = {}
>>> bar.setdefault("1", Foo())
hello
<__main__.Foo object at 0x104916da0>
>>> bar.setdefault("1", Foo())
hello
<__main__.Foo object at 0x104916da0>
>>> 

setdefault() still calls Foo() twice because the argument is evaluated before checking if the key exists.

Impact

  • Orphan processes are spawned unnecessarily.
  • These processes are never used for inference but remain alive.
  • Can lead to high memory usage and degraded performance in production environments with multiple models or high worker counts.

dinispeixoto avatar Oct 09 '25 08:10 dinispeixoto