Support of distributed steps
Originally created on Thu, 25 Jan 2018 12:17:16 +0200
We would like to have the ability to run steps of the same project in parallel on several machines to improve build latency. Possible choices: docker clusters, jenkins, mesos, other.
Input from customers
- Launch and execution of the project must not differ from the launch without using distributed build option. This means, logs and output must be almost identical.
- We should avoid creating new jobs/configurations if possible.
- This feature must include and absorb the background steps feature. This means all configuration, usage and logs must be identical for both features. Difference between background and distributed should not be defined by different syntax or logic. Rather, it should be some global or per-step config option.
- Parallelism must not be based on the steps hierarchy.
- There must be ability to specify what steps can be done sequentially to other steps, and what steps can be done in parallel.
- Discretionary waiting for steps must be supported.
Proposal for controlling parallelism
- Introduce thread id to config.
- Steps with the same id are always sequential
- Steps with different ids can be launched in parallel
- Steps without thread id obtain default (e.g. 0) and are executed sequentially
- There must explicit or implicit splitting (thread creation) and joining (thread waiting) points. Probably, initial splitting can be based on having non-default thread id. For example, all steps without id can work as implicit join.
Ideally, the parallel steps must be parallel in Blue Ocean
There must be a way to select nodes/PCs for specific parallel steps. These nodes could be specialized in some way. For example, they can have connections to hardware, which is needed for testing.
On my opinion it would be better to use some queue label – string value instead of thread Id – digital number. Unlike digital number a Label is verbose.
For example When I need to run all my tests sequential (not parallel) I will use queue label test instead thread id 1
Totally different approach is to allow pausing and resuming the execution between steps. In this case one execution of the config can be started on one PC and continued on another. However, this approach doesn't improve the parallelism.