garak
garak copied to clipboard
`_config` object contains `state` not suitable for multiprocessing
Overview
Many garak._config
namespace variables such as transient, run, reporting
are currently populated at state during execution. The management of these values between testing runs for a long running service does not lend to a singleton as then only one of each can exist at a time in the python interpreter. Further since this data is populated into a namespace variable reload of the namespace will not maintain these values.
The behavior of the multiprocessing
package is more akin to launching another new python process that uses only the files directly connected to the source objects passed.
#645 addresses an instance that exposed this concern.
Current state example
In the case of the probe
class and its associated class hierarchy being called the only argument is defined in the same class and state of config is not passed.
Desired state
Any stateful _config
likely needs to either be passed or consistently reloaded in new processes. Investigation and understanding of use cases needs occur to consolidate on patterns either in code standards or library framework supporting task execution with all required context.
When addressing this:
- [ ] Consider adding an
assert
or similar to trackhitlogfile
state - as eitherNone
or aio.TextIOBase
of which its.closed==False
. See this comment for details: https://github.com/leondz/garak/pull/665#issue-2285198142
How should we triage/roadmap this fella?
There is some ideation about possibly injecting a shared memory location to sync config across process and thread boundaries, maybe set a goal to introduce this in this quarter.
The idea proposed is for module level code in _config
to search for an existing shared memory object that would be created in the parent runtime when _config
is locked to start a run.