swarm icon indicating copy to clipboard operation
swarm copied to clipboard

RFC: Assign roles to nodes to control distribution

Open gofoss opened this issue 7 years ago • 13 comments

Not sure if this belongs here or fits more to libcluster functionality, but would be great if one could assign roles to nodes, and then start worker processes only on nodes of particular role.

Thanks!

gofoss avatar Nov 12 '17 15:11 gofoss

This is a great idea, though it would require some significant changes. I'll definitely consider it if there is enough feedback in favor of it.

bitwalker avatar Feb 07 '18 20:02 bitwalker

I agree, I had a case where I wanted to have two types of nodes - web and engine. I wanted to start tasks on engine nodes but start them using web nodes (as they were usually started by some calls from controllers). To do that I had to add some custom routing calls to the nodes and then call the Swarm functions directly on some of the engine nodes. It would be great to be able to call swarm from those "blacklisted" nodes too.

kelostrada avatar Feb 08 '18 10:02 kelostrada

Swarm will never support calling functions from blacklisted nodes (since that feature is used to ensure Swarm doesn't attempt to work with those nodes at all), but if by blacklisted you mean nodes of the web role being able to start processes on nodes of engine role (using your example), that should certainly be doable if roles are supported - starting a process on a node of a particular role would likely just be a case of specifying the role that process belongs to.

My intent would be to use Kubernetes terminology for this, so rather than "roles", nodes and processes would support "labels", and the combination of labels applied would yield the subset of nodes that are allowed to host a given process.

bitwalker avatar Feb 09 '18 20:02 bitwalker

I have a possibly related need, to create groups of processes, and ensure that each process lives on separate node in a different availability zone - would this work make such a thing possible - or should I raise it elsewhere?

bryanhuntesl avatar Feb 09 '18 22:02 bryanhuntesl

Labels would certainly be a solution to that problem - as it stands I haven't done testing with Swarm involving nodes which are geographically split across regions, so I would definitely be interested to hear of any experiences there, and whether any changes are required to handle that setup better.

bitwalker avatar Feb 09 '18 22:02 bitwalker

if by blacklisted you mean nodes of the web role being able to start processes on nodes of engine role

Yeah, that's what I meant, I just mentioned it as one more vote up for this feature ;)

kelostrada avatar Feb 09 '18 23:02 kelostrada

Somewhat late to the party, but 👍 from me.

I'm looking to have an assembly of components, spread across nodes and servers. Some servers may be optimized for database work, others for networking, and so on.

I'd like to be able to create node_type labels for nodes, and worker_type labels for processes, and then to say "worker db_reader runs on database_nodes" etc.

Bonus points for allowing many to many associations :)

pragdave avatar Mar 08 '18 00:03 pragdave

I'm copying my comment on another issue as it might be more relevant here:

@beardedeagle I really like the idea of roles, but I think this "tagging" could be more simply done if using libring's configuration.

I would suggest that register_name would accept a ring option. If the ring exists in libring's config, it must be used, otherwise all the nodes would be used.

hickscorp avatar Apr 02 '18 17:04 hickscorp

@hickscorp this is a great idea, I think this could definitely work. IMHO, using rings would be the simplest implementation of a "node roles" concept.

arjan avatar Jul 20 '18 14:07 arjan

that looks elegant, I am just wondering how that correlates with libcluster's strategies, like EC2 or GCE with the tagged nodes discovery.

gofoss avatar Jul 20 '18 14:07 gofoss

Exactly... I am studying the code right now.

libcluster is somewhat different from this, I think; libcluster only ensures that the cluster is fully connected.

libring can automatically fill its rings with discovered nodes (monitor_nodes: true); but it seems that swarm only adds nodes to its (currently single) ring when the swarm application was started.

Swarm's Strategy code currently looks a bit like it serves multiple purposes:

  • ring management (add / remove node), basically proxying to libring
  • key_to_node quorum decision making

arjan avatar Jul 20 '18 15:07 arjan

Another take on this might be to create multiple registries. Right now, Swarm is a single, distributed registry; but, analogous to the elixir builtin registry (which is not distributed); instead of using roles / labels or multiple rings, we could start multiple Swarm registry instances.

arjan avatar Jul 21 '18 19:07 arjan

This may actually relate to the issue I just posted: https://github.com/bitwalker/swarm/issues/95, which illustrates a challenge I'm having running separate applications using swarm. Having multiple swarm's side by side, or being able to add rules/labels/roles to ensure that certain components are only run on nodes that are running their application would be very useful.

zachdaniel avatar Jul 24 '18 11:07 zachdaniel