Race condition accessing generators during multiprocessing

Open jmartin-tech opened this issue 1 month ago • 1 comments

#1414 exposed a race condition when requesting inference results from a Generator held by a probe if parallel_attempts are enabled. When multiprocessing passes _execute_attempt from the base Probe class the probe and all objects it holds are serialized via pickle to be restored in the child process for execution. Due to this some generators alter the state of the original instance to avoid objects that fail the serialization action and on deserialization restore these objects in the newly created child process. Since the deserialization action does not occur in the core process the original instance is left in a state where these objects have been removed. While the _call_model method what may access these objects checks the state of the instance upon entry to the method the asynchronous nature of the multi-processing queue can result in the generator client objects references being removed before the method completes.

#1464 reduces this issue for OpenAICompatible generators by holding local variable references to objects cleared by _clear_client() further assurances are needed, also ReplicateGenerator and MistralGenerator may be susceptible to the similar conditions.

Expected behavior

Generator access on any process should be viable during runtime.

Current behavior

Generators raise exceptions on access to NoneType objects when serialization occurs after the method checks state and before access to the objects required.

garak version

0.13.0

Nov 10 '25 15:11 jmartin-tech

One idea for addressing may be to expect _load_client() to return any objects it sets and ensure access to local references vs self attributes in _call_model.

Nov 10 '25 15:11 jmartin-tech