nomad icon indicating copy to clipboard operation
nomad copied to clipboard

consul: support for admin partitions

Open shoenig opened this issue 2 years ago • 2 comments

Consul 1.11 added support for admin partitions.

Similar to the work done for adding support for Consul namespaces, Nomad should also add support for Consul admin partitions.

shoenig avatar May 26 '22 17:05 shoenig

(EDIT: @davidfleming's reply below spells out some issues with my suggestion below)

Wanted to comment on the current state of things for anybody interested. Currently, Nomad doesn’t “know” about Admin Partitions, but it still works with them. There’s nothing that should break using the two together assuming you split up nodes and workloads properly. The question is, how do you split that up? The rough steps would be something like:

  • Step 1: Get Consul Admin Partitions running and split up your Consul nodes into partitions as you see fit.
  • Step 2: Tag the nodes in Nomad appropriately so that you can send specific workloads to specific nodes. I think the easiest way to do this is to use datacenters. If Admin Partitions map directly to Nomad datacenters, then it is really easy to know which Nomad node maps to which admin partition.
  • Step 3: When deploying Nomad jobs, make sure that you send in the right Consul creds for the right Admin Partition. If you are passing in a Consul token that has access to that partition.

Notes:

  • You don’t have to use datacenter to do this split. You could technically just use node_class or metadata to define each partition and then make sure Nomad workloads are placed onto the right node with those. In my opinion, this is more error-prone.
  • Splitting up Nomad clusters would also work. Each Admin partition could be a different Nomad cluster.
  • Nomad might introduce the concept of “node pools” at some point which would function somewhat like datacenter in terms of how you can split nodes up. This might eventually become the suggested way.

More official support for admin partitions is something that we've looked at briefly, but not explored in depth. If you have a use case where Nomad having first-class knowledge of admin partitions would be helpful, please let us know. A common use case where splitting by datacenter, cluster, or node_class is not sufficient would help us make the case for this internally.

mikenomitch avatar Oct 28 '22 17:10 mikenomitch

Hi @mikenomitch

First thank you for commenting on this. We have been looking forward to seeing progress on this and a discussion on the direction this will potentially take would be appreciated.

Our use case:

First class support for Admin Partitions would be very useful for what we are trying to accomplish. Particularly we are looking to separate workload by jobspec and bin pack on a cluster of clients. What that looks like for us is a breakdown of "production" vs "non-production" clients. The "non-production" workloads will consist of N environments (Think dev, staging, test, PR-123, etc). Ideally we would like to easily be able to spin up and down environments by bin packing them into existing clusters. The pattern you mentioned would require us to spin up new consul clients for each environment. While we could hack this together with namespaces it would become cumbersome and ideally it would not provide the same level of isolation we are looking for(ie the environments should not gossip or talk to each other).

We did start down this route already as an interim step but already hit some roadblocks in implementing it.

Some technical problems with the steps you listed:

  1. When the nomad client tries to query the consul agent, it complains about mismatch partition. Example of the consul agent being configured to a "staging" partition and the nomad client defaulting to "default" since it is unaware of partitions:

nomad[392012]: 2022-10-28T17:46:26.521Z [ERROR] consul.sync: still unable to update services in Consul: failures=10 error="failed to query Consul services: Unexpected response code: 400 (request targets partition "default" which does not match agent partition "staging")"

  1. If the nomad client ends up using the partition of the consul agent will the functionality of client_auto_join still work? Ie will the servers be registered in default but the nomad client be looking for them in the wrong partition (that of the client)?

Thanks, David

davidfleming avatar Oct 29 '22 08:10 davidfleming

Hey @davidfleming, we were looking into this on the Nomad engineering team, and unfortunately to "do it right" it'll take more effort than we have time for in the near future. We do plan to get to it, but not in the next month or two.

In the meantime, I noticed that there's a CONSUL_PARTITION env var you can set. Nomad itself isn't setting "default" so I think if you set that wherever you run the Consul client, that should fix the first problem you noted.

I think this should solve problem 2 as well, but to be honest I'm not sure.

Sorry about the delay on all of this, but hope that workaround works 🤞

mikenomitch avatar Jan 27 '23 16:01 mikenomitch

Here's our plan for implementing Admin Partitions support in Nomad. I'll break this down into three sections.

Fingerprinting

A Consul Enterprise agent can belong to exactly one partition. We require that each Nomad agent has its own agent (if you're using Consul). So the partition becomes an easy target for us to fingerprint. You then immediately get two options for allocating Nomad workloads to Consul partitions:

  • Job authors can add a constraint to their job on the attribute attr.consul.partition (or attr.consul.$clusterName.partition for non-default clusters).
  • Cluster administrators can set a 1:1 relationship between Consul partitions and Nomad node pools by having the Consul agent configured for the appropriate partition on the nodes where Nomad is in a particular node pool. (Ex. you can set Nomad agent node_pool = "prod" and the Consul agent partition = "prod".)

The partition is exposed in Consul's existing /v1/agent/self endpoint, so implementing fingerprinting turns out to be fairly trivial. One minor annoyance is that the API returns .Config.Partition = "default" only if the partition is explicitly set, rather than just the default. So when we fingerprint, we'll check the SKU and if it's Consul Enterprise we'll fill in the default partition if missing.

The fingerprinting work turns out to be trivial, so I've got a draft PR up for that here https://github.com/hashicorp/nomad/pull/19485

fingerprinting output

Consul CE agent (no partitions):

$ curl -s "http://localhost:8500/v1/agent/self" | jq .Config
{
  "Datacenter": "dc1",
  "PrimaryDatacenter": "dc1",
  "NodeName": "nomad0",
  "NodeID": "ec86d276-1c51-edb0-ad58-c79ec07f07e2",
  "Revision": "61547a41",
  "Server": false,
  "Version": "1.13.6",
  "BuildDate": "2023-01-26T15:59:13Z"
}
$ nomad node status -verbose -self | grep consul
consul.connect                      = true
consul.datacenter                   = dc1
consul.ft.namespaces                = false
consul.grpc                         = 8502
consul.revision                     = 61547a41
consul.server                       = false
consul.sku                          = oss
consul.version                      = 1.13.6
unique.consul.name                  = nomad0

Consul Enterprise agent with non-default partition:

$ curl -s "http://localhost:8500/v1/agent/self" | jq .Config
{
  "Datacenter": "dc1",
  "PrimaryDatacenter": "dc1",
  "NodeName": "nomad0",
  "NodeID": "ec86d276-1c51-edb0-ad58-c79ec07f07e2",
  "Partition": "example",
  "Revision": "d6969061",
  "Server": false,
  "Version": "1.16.0+ent",
  "BuildDate": "2023-06-26T20:27:46Z"
}
$ nomad node status -verbose -self | grep consul
consul.connect                      = true
consul.datacenter                   = dc1
consul.ft.namespaces                = true
consul.grpc                         = 8502
consul.partition                    = example
consul.revision                     = d6969061
consul.server                       = false
consul.sku                          = ent
consul.version                      = 1.16.0+ent
unique.consul.name                  = nomad0

Jobspec

Next, we can add a partition to the consul block in the jobspec.

If consul.partition is set in the job, we'd add an implicit constraint in one of the job mutating hooks we already have for Consul (either job_endpoint_hook_consul_ce.go#L92 or more likely job_endpoint_hooks.go#L179).

Enterprise Considerations

Consul admin partitions are a Consul Enterprise feature, so at first glance it would make sense to restrict this option to Nomad Enterprise as well. But we currently allow users to set a Consul namespace for their Nomad CE cluster, and this feature maps rather directly to that. Once the fingerprinting is added, adding the constraint is trivial for any user, so restricting just the jobspec portion of this to ENT wouldn't make sense either. So this feature will be fully implemented in Nomad Community Edition.

tgross avatar Dec 14 '23 19:12 tgross

Fingerprinting has been merged and will ship in the next regular 1.7.x release of Nomad (most likely 1.7.3).

tgross avatar Dec 15 '23 14:12 tgross