simpleflow icon indicating copy to clipboard operation
simpleflow copied to clipboard

Manage computation nodes

Open ggreg opened this issue 10 years ago • 0 comments

While adding the decider and activity worker processes, I was thinking that anything that could help to bootstrap the execution of a workflow would be a great improvement. It led to the addition of simpleflow commands to start, track, and stop an execution. One of the most helpful command to define and test a workflow is simpleflow standalone. In a single command, one can execute the whole workflow by combining decider and activity processes below a single process. However this command does not scale for a distributed deployment.

The purpose of the feature described here is to provide an interface to manage computing resources. The main artifact is the node which refer to a machine inside a group of machine. Speaking of a cluster may suggest all nodes are the same, or at least similar. However we want to be able to support different types of computation and to easily provides the right resources: CPU, memory, storage and network.

A node requires a snapshot of its initial state, commonly called an image. In AWS parlance we will refer to an AMI which is the filesystem snapshot used to boot an EC2 instance.

Though there are libraries that abstract the specific cloud hosting provider such as libcloud, I prefer to start to experiment with Amazon EC2. It does not mean I want to hard code everything. The EC2 backend will registered itself as part of several hosting backends. At first, it will be the only one, but feel free to add any other backend you need or like.

How to use this interface? After reading the paragraphs above you may be thinking it's little abstract. Let's take a concrete example on how to use it.

To execute a workflow, we need three processes:

  • A decider
  • An activity worker
  • A client to start the execution

All these processes are stateless. The decider and activity worker are polling tasks. It means they do not need to be running when the client starts a workflow or SWF triggers an event. We could then leverage this behavior to create a group of processes and their underlying supporting machine on-demand:

  • The client ask for a decider
  • If there is no instance running a decider process, it creates an EC2 instance
  • The client starts the workflow execution
  • A decider takes the decision tasks and executes the workflow definition. This definition requires activity workers and instanciates them. It has to map activities to their corresponding task list. How to know when to terminate the instances? The simple way is to wait for the termination of the workflow execution. With this way there is no risk to terminate an instance while it is still needed for an incoming activity task. However it does not efficiently uses resources. EC2 bills by hour. We should consider the period arbitrary. What matters is if we are within the period i.e. the activity completed before an hour, or beyond. Wherever we start using a new hour, it is better to continue using it for incoming activity tasks. The instance should be terminated when no activity will be scheduled.

ggreg avatar Aug 18 '15 12:08 ggreg