grass
grass copied to clipboard
[Feat] ParallelModuleQueue (python multiprocessing): don't wait for entire block to finish before pulling new processes
The option to run GRASS modules in parallel (in python) is implemented via the ParallelModuleQueue class. The standard way (?) is to define a processing queue via an nprocs
parameter, add GRASS modules to be executed in parallel via the put()
method and finally start the parallel processing using the wait()
method.
The way it is implemented now, the queue seems to run a number of processes defined by nprocs
and waits for all processes to finish before starting the next "block" of processes. This means that the longest process determines the duration of an entire processing "block".
Ideally, free slots could be filled directly with pending processes from the queue instead.
I agree that is a problem, which is partially the reason I usually just use standard Python multiprocessing.Pool methods (like map_async
) with run_command
. Just curious, do you prefer ParallelModuleQueue for some specific reason?
No, not at all, I am just used to using it since it is the pygrass way ;)
Also, some GRASS modules from the temporal
framework use ParallelModuleQueue, e.g. for aggregation:
https://github.com/OSGeo/grass/blob/1961472afeb7633c9b744b0a60c923fb9b1d4411/python/grass/temporal/aggregation.py#L267