terraform-provider-iterative icon indicating copy to clipboard operation
terraform-provider-iterative copied to clipboard

parallel: id & examples

Open casperdcl opened this issue 2 years ago • 9 comments

  1. Expose task index via environment variables, similar to:
  2. add minimal working example to docs using parallelism = 8, script = "... some_conditional_fork_and_join_code($TPI_PARALLEL_INDEX, $TPI_PARALLEL_TOTAL) ...

casperdcl avatar May 17 '22 18:05 casperdcl

NODE_INDEX & NODE_TOTAL — front-end versus back-end

front-end-vs-back-end-1

0x2b3bfa0 avatar May 17 '22 18:05 0x2b3bfa0

Note that running different code on each instance is not easy: determining the node index requires a few orchestrator building blocks.

0x2b3bfa0 avatar May 17 '22 18:05 0x2b3bfa0

idk what you mean by different code. I'm talking about same code, different logic-branch owing to different env vars.

#script
index = os.environ.get('TPI_PARALLEL_INDEX', 0)
total = os.environ.get('TPI_PARALLEL_TOTAL', 1)

tasks = 1337
batch_size = int(math.ceil(tasks / total))
for step in range(index*batch_size, (index+1)*batch_size, tasks):
    do_work(step)

casperdcl avatar May 18 '22 07:05 casperdcl

I'm talking about same code, different logic-branch

Also known as “different code” or, in other words, function parallelism.

0x2b3bfa0 avatar May 18 '22 11:05 0x2b3bfa0

PARALLEL_TOTAL is the same as parallelism and is straightforward to implement.

PARALLEL_INDEX is not straightforward to implement: it requires synchronization to avoid having several machines with the same index.

If you add this to “in progress”, expect me to spend a couple weeks doing what we're supposed to do two quarters from now; i.e. determine whether to reinvent the orchestrator[^1] or not and, if advisable, reinvent it.

[^1]: It always begins with Raft & Serf, and then you feel the need of adding a command-line tool, some extra supporting services... and you have an orchestrator, identical to the existing ones, but admittedly less elegant.

0x2b3bfa0 avatar May 18 '22 14:05 0x2b3bfa0

PARALLEL_INDEX is not straightforward to implement

Really? Argh. Backlogging.

casperdcl avatar May 18 '22 15:05 casperdcl

Note to future self: it's also possible to hack something with a cloud-managed atomic queue, popping items when instances boot and pushing them when they're about to terminate. 🤷🏼‍♂️

Another dodecagonal wheel.

0x2b3bfa0 avatar May 19 '22 02:05 0x2b3bfa0

Another hacky possibility: two instance groups, one for the leader instance and other for the workers.

0x2b3bfa0 avatar May 20 '22 06:05 0x2b3bfa0

Re-commenting here for better context.

I came across this PR while looking for this feature with AWS EC2. I think the ability to operate parallel instances with regular cloud providers and have some sort of indexing, or any mechanism, to dispatch work to the different instances can greatly help small teams and individual developers who don't have resources to manage k8s.

Originally posted by @redabuspatrol in https://github.com/iterative/terraform-provider-iterative/issues/597#issuecomment-1183537070

redabuspatrol avatar Jul 14 '22 13:07 redabuspatrol