reflow basic architecture doc

Thanks for opening this up and the nice docs so far. I am wondering if a basic architecture doc focused on infrastructure and runtime can be produced. Things that come to mind:

how does client job submission work?
what persistent server processes are there, and what information is persisted?
what agents run on EC2 instances
what are the requirements for the instance (a particular AMI, certain software), is there a way to SSH to them? Can new software (e.g. monitoring containers) be layered on top?
how do the docker jobs get submitted?
how are the job results relayed on completion?
how does the runtime handle downloading S3 files in detail (does it wait to receive the full file)?
do instances maintain caches of files from previous runs or do they get wiped?

Dec 01 '17 23:12 gregwebs

Hi @gregwebs, thanks for your comment. We have such a document internally, and I plan on formatting it for external use. (And also make sure it's up to date with reality.)

Dec 05 '17 06:12 mariusae

Also, to address some of your specific questions:

how does client job submission work? Reflow does not do asynchronous job submission. Think of Reflow like you would a program language interpreter (e.g., Python), you run reflow run foo.rf to run your program. It just happens to be able to use a large EC2 cluster to do its work. Asynchronous submission is an orthogonal concern: that being said, we are planning on providing such a feature natively in Reflow.
what persistent server processes are there, and what information is persisted? For each instance that Reflow starts, there is currently only one persistent process: the reflowlet (Reflow's "agent" process), which is responsible for managing the machine's resources and coordinate requests from multiple Reflow evaluators. The only information that is persisted is cache metadata and data, if configured.
what agents run on EC2 instances? See above.
what are the requirements for the instance (a particular AMI, certain software), is there a way to SSH to them? Can new software (e.g. monitoring containers) be layered on top? The requirements are that Docker is available, that the AMI uses systemd, and that it can be configured via cloud-config. In its default configuration, Reflow uses the standard CoreOS system images. You can ssh into them: when you run reflow setup-ec2, it captures your public ssh key in the configuration that is produced. This is spelled out in the README: "You can ssh into the host in order to inspect what's going on. Reflow launched the instance with your public SSH key (as long as it was setup by reflow setup-ec2, and $HOME/.ssh/id_rsa.pub existed at that time)."
how do the docker jobs get submitted? The reflowlets communicate directly with the docker daemons running on the instances.
how are the job results relayed on completion? When a Reflow program completes, the value that Main evaluates to is printed. However, this is seldomly useful by itself. Usually the top-level Main is some side-effecting computation, typically copying the results to some desired destination. See 1000align for an example. We have some (I think) exciting plans to address this, too, by allowing Reflow modules to expose virtual file trees directly, and making file operations available via the reflow command.
how does the runtime handle downloading S3 files in detail (does it wait to receive the full file)? Yes, Reflow currently runs an exec only when its prerequisites are fully evaluated. The current contract is that a file is a file, and a dir is a directory. We're working on introducing a form of steaming into Reflow so that an exec may receive (and produce) data incrementally. This would mean that files are materialized into Unix streams instead.
do instances maintain caches of files from previous runs or do they get wiped? they do, for some time. File repositories are associated with specific allocs (resource allocations) on the individual machines, and their lifecycle is tied to the alloc's lifecycle. Allocs are collected only when they become idle (i.e., no evaluator is maintaining a lease to the alloc) and their resource reservations are needed by another evaluator. It's possible to explicitly reuse an alloc by passing the -alloc flag to Reflow's run command, but this is a little too much insider baseball for most users. We plan on making this behave "as expected" by a different mechanism: when an alloc is retired, instead of also removing its file repository, it is instead subsumed into an instance file repository. This becomes similar to a look-aside cache: when cache transfers are performed, we first look in the instance's repository before transferring files from the (remote) cache. The additional complication here is that we'll need to garbage collect the instance repository whenever extra disk space is needed on the instance.

Dec 05 '17 06:12 mariusae

thanks for the detailed response! Everything makes sense here except for the first one about async submission. What happens if I tell reflow to run and then Ctrl-C?

Dec 05 '17 14:12 gregwebs

Reflow behaves just like any other program interpreter: ctrl-c stops execution. Of course, because Reflow is incremental, and intermediate reductions are cached, if you start it up again, it's (most of the time) able to pick up from where it stopped..

Dec 05 '17 17:12 mariusae

Also to clarify: when you hit control-C, reflow will fail to maintain keepalives to the worker instances; this will cause them to automatically terminate after 10 minutes of idleness. This is also noted in the README:

Reflow comes with a built-in cluster manager, which is responsible for elastically increasing or decreasing required compute resources. The AWS EC2 cluster manager keeps track of instance type availability and account limits, and uses these to launch the most appropriate set of instances for a given job. When instances become idle, they will terminate themselves if they are idle for more than 10 minutes; idle instances are reused when possible.

Dec 05 '17 22:12 mariusae

Does it catch Ctrl+C and send a stop request to the agent, or reflow is sending keep alives to the agent? Ctrl+C is maybe not the best example because it can be caught. What happens when I lose all network connectivity from where I am running reflow? If I close my laptop then my job stops? So works a lot better when submitted from a tmux session on a EC2 workstation?

Thanks for explaining. Idle instance termination is really a separate concern from this question since another job may be scheduled to the instance, so I don't think the README statement explains this aspect.

Dec 06 '17 03:12 gregwebs

Reflow maintains a keep alive to all of its resource allocations. Thus if the job is killed it in any way, or loses connectivity, the keepalive is no longer maintained. An instance is considered idle if none of its allocs are alive.

Dec 06 '17 04:12 mariusae

Thanks for all the explanations! I will close this, but you could re-open it if it serves as a useful reminder to add an architecture doc.

I am quite surprised by some aspects of the current operational architecture, although I see how it was probably the easiest way to get things to a good enough state. I use a more centralized job flow controller that I could try connecting this to in order to achieve an async workflow, etc.

Dec 06 '17 15:12 gregwebs

The design may be unusual but it's not done out of ease or convenience. This design lets us very easily control the whole compute stack, and provision resources as they are needed—the cloud is elastic after all, and Reflow fully exploits this.

With respect to synchronous vs. asynchronous execution, I view this as an orthogonal concern: an external system can be responsible for managing individual Reflow jobs. (This is what we do at GRAIL.)

This split of responsibilities keep things both orthogonal and simple: Reflow interprets and executes Reflow programs on a cloud provider; a separate system is responsible for managing a set of jobs.

Dec 06 '17 16:12 mariusae

Sorry, I may have mis-understood your description, going off an architecture doc would make things clearer :)

Also makes a lot more sense that you do have a system for async execution. Would be great to see the architecture of that also even if there is no implementation provided.

Dec 07 '17 17:12 gregwebs

The code is beautiful. Just want to ask if you plan to publish an architecture doc soon. Thanks!

Jan 23 '18 22:01 blockchainian

Hi @chenpengcheng, thanks! Yes, this is getting closer to the top of my list...

Jan 24 '18 17:01 mariusae

Asynchronous submission is an orthogonal concern: that being said, we are planning on providing such a feature natively in Reflow.

I view this as an orthogonal concern: an external system can be responsible for managing individual Reflow jobs. (This is what we do at GRAIL.)

@mariusae Is this still on the list and still supposed to go in Reflow core?

As always, very interesting project!

May 25 '18 19:05 hchauvin

@hchauvin yep, @prasadgopal is working on this actively right now :-)

May 25 '18 19:05 mariusae

Awesome, thanks!

May 25 '18 20:05 hchauvin

reflow reflow copied to clipboard

basic architecture doc

reflow
reflow copied to clipboard