Universum icon indicating copy to clipboard operation
Universum copied to clipboard

Support of distributed steps

Open k-dovgan opened this issue 6 years ago • 3 comments

Originally created on Thu, 25 Jan 2018 12:17:16 +0200

We would like to have the ability to run steps of the same project in parallel on several machines to improve build latency. Possible choices: docker clusters, jenkins, mesos, other.

Input from customers

  1. Launch and execution of the project must not differ from the launch without using distributed build option. This means, logs and output must be almost identical.
  2. We should avoid creating new jobs/configurations if possible.
  3. This feature must include and absorb the background steps feature. This means all configuration, usage and logs must be identical for both features. Difference between background and distributed should not be defined by different syntax or logic. Rather, it should be some global or per-step config option.
  4. Parallelism must not be based on the steps hierarchy.
  5. There must be ability to specify what steps can be done sequentially to other steps, and what steps can be done in parallel.
  6. Discretionary waiting for steps must be supported.

Proposal for controlling parallelism

  1. Introduce thread id to config.
  2. Steps with the same id are always sequential
  3. Steps with different ids can be launched in parallel
  4. Steps without thread id obtain default (e.g. 0) and are executed sequentially
  5. There must explicit or implicit splitting (thread creation) and joining (thread waiting) points. Probably, initial splitting can be based on having non-default thread id. For example, all steps without id can work as implicit join.

Ideally, the parallel steps must be parallel in Blue Ocean

k-dovgan avatar Jan 29 '19 12:01 k-dovgan

There must be a way to select nodes/PCs for specific parallel steps. These nodes could be specialized in some way. For example, they can have connections to hardware, which is needed for testing.

i-keliukh avatar Feb 06 '19 11:02 i-keliukh

On my opinion it would be better to use some queue labelstring value instead of thread Iddigital number. Unlike digital number a Label is verbose.

For example When I need to run all my tests sequential (not parallel) I will use queue label test instead thread id 1

i-savynikh avatar Feb 07 '19 12:02 i-savynikh

Totally different approach is to allow pausing and resuming the execution between steps. In this case one execution of the config can be started on one PC and continued on another. However, this approach doesn't improve the parallelism.

i-keliukh avatar Sep 07 '20 06:09 i-keliukh