Grid2Op
Grid2Op copied to clipboard
grid2op.make(...) not threadsafe?
Environment
- Grid2op version:
1.10.2
- System:
windows 11
- Python:
3.11.9
Bug description
When using Python's Multiprocessing pool or concurrent ThreadPoolExecutor initializing more than 1 environment in parallel (asynchronously) can throw errors on some of the threads because the attributes of the backend (or other gridobjects) are not properly defined. I suspect this is because they are trying to initialize off the same resources on the disk (i.e. the .json and chronics files) which are locked by the first thread that grabs them.
How to reproduce
import grid2op
import multiprocessing as mp
from concurrent.futures import ThreadPoolExecutor
def make_env():
env = grid2op.make("rte_case14_realistic")
obs = env.reset()
print(obs)
return obs
futures = set()
with ThreadPoolExecutor(max_workers=mp.cpu_count()-1) as executor:
for i in range(mp.cpu_count() - 1):
futures.add(executor.submit(make_env))
for item in futures:
print(item)
Will usually throw a Backend error or TypeError on a subset of the function calls. These errors correspond to attributes of the Backend being None or -1 (i.e. Backend was not successfully initialized).
Current output
Some of the threads return errors:
<Future at 0x1db68531fd0 state=finished returned CompleteObservation_unknown>
<Future at 0x1db710ad850 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66b26650 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66d48a90 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66e2f690 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66c9e6d0 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66d48d10 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66e2d110 state=finished raised TypeError>
<Future at 0x1db66e2c3d0 state=finished raised ValueError>
<Future at 0x1db6b5097d0 state=finished raised ValueError>
<Future at 0x1db6accfd50 state=finished returned CompleteObservation_unknown>
<Future at 0x1db6ac94550 state=finished returned CompleteObservation_rte_case14_realistic>
<Future at 0x1db6d581b50 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66d48d90 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66d4b9d0 state=finished returned CompleteObservation_unknown>
Expected output
All threads successfully return the initial observation:
<Future at 0x1db68531fd0 state=finished returned CompleteObservation_unknown>
<Future at 0x1db710ad850 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66b26650 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66d48a90 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66e2f690 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66c9e6d0 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66d48d10 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66e2d110 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66e2c3d0 state=finished returned CompleteObservation_unknown>
<Future at 0x1db6b5097d0 state=finished returned CompleteObservation_unknown>
<Future at 0x1db6accfd50 state=finished returned CompleteObservation_unknown>
<Future at 0x1db6ac94550 state=finished returned CompleteObservation_unknown>
<Future at 0x1db6d581b50 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66d48d90 state=finished returned CompleteObservation_unknown>
<Future at 0x1db66d4b9d0 state=finished returned CompleteObservation_unknown>
Temporary Fix
Put grid2op.make(...) inside a for loop with try-except (not recommended!):
N_ATTEMPTS = 10
for _ in range(N_ATTEMPTS):
try:
env = grid2op.make(dataset=self.env_name, backend=env_params.backend(), **grid2op_params)
break
except: # Resource is busy, wait
time.sleep(1.0)