terraform-provider-nomad icon indicating copy to clipboard operation
terraform-provider-nomad copied to clipboard

Feature: Add ability to drain nodes

Open nvx opened this issue 2 years ago • 10 comments

When provisioning nomad client nodes with Terraform, it would be useful to be able to use the Nomad terraform provider to gracefully drain (including waiting on the drain to complete) before destroying the VM.

I've gotten close by using a destroy time exec provisioner in a null_resource, but the problem is it is not possible to access a fresh Nomad ACL token. Having it as a feature of the provider fixes this issue (as well as reduces the brittleness inherent to exec provisioners) as providers can reference credentials from data sources directly that refresh during the plan.

nvx avatar Sep 14 '21 05:09 nvx

I could see this being a resource that takes in the node id (either read out of the started VM, or by provisioning in the node ID into the VM using a random_uuid data source) that could optionally wait for the node to come online - technically not provisioning the node (the node registers itself after all) but would validate that the node came up properly which is an improvement on current workflows which have no way of checking the node has come up properly.

Then on destroy it drains the node and marks ineligible and waits for the drain to complete.

nvx avatar Sep 14 '21 05:09 nvx

Hi @nvx 👋

This is an interesting, but I'm not sure if it fits well with the Terraform model 🤔

For example, what would it mean to delete this resource? We can't really "undo" a node drain so the resource would kind of just hang there?

These types of one-off operations seem better suited for the Nomad CLI.

lgfa29 avatar Nov 03 '21 00:11 lgfa29

For example, what would it mean to delete this resource? We can't really "undo" a node drain so the resource would kind of just hang there?

Marking as eligible and ineligible for scheduling is a reversible operation, I'd envisage triggering a drain as an option when marking ineligible.

While the Nomad CLI can be used for this, it means that the lifecycle of a Nomad worker can no longer be maintained through Terraform alone as you'd need to drain a worker manually before having Terraform do the rest of the lifecycle management which can get a bit onerous when doing immutable blue/green deployments where destroying resources is a BAU activity.

We've tried to emulate it using destroy time exec provisioners to call the Nomad CLI but it gets super ugly trying to access credentials to perform the operation (our Nomad tokens are read out of Vault and short lived as per Terraform best practices to reduce the risk of secrets in the state file).

nvx avatar Nov 03 '21 05:11 nvx

Marking as eligible and ineligible for scheduling is a reversible operation

That's a different endpoint, so probably another resource altogether 🙂

So nomad_node_drain would use /v1/node/:node_id/drain and nomad_node_elibigility would use /v1/node/:node_id/eligibility. Both endpoints generate an evaluation in Nomad, so I guess that's the resource being "created"?

Changing eligibility on delete seems error prone, so I think a no-op would be better.

Would you be interested in working on a PR for them?

lgfa29 avatar Nov 03 '21 13:11 lgfa29

Marking as eligible and ineligible for scheduling is a reversible operation

That's a different endpoint, so probably another resource altogether 🙂

So nomad_node_drain would use /v1/node/:node_id/drain and nomad_node_elibigility would use /v1/node/:node_id/eligibility. Both endpoints generate an evaluation in Nomad, so I guess that's the resource being "created"?

I just had a closer look what the Nomad CLI is doing (to date I've been draining nodes manually), and it looks like the logic is enabling drain on a node sets the eligibility to false by default (but you can also set the eligibility to false without doing a drain), and cancelling an existing drain changes the eligibility back to true by default (but you can also do this separately to the drain as well, or cancel a running drain without changing the eligibility back).

I guess in my head I had conflated the drain and eligibility flags together since normally I'm dealing with them together (indeed I don't think you can start a drain on a host without also marking it ineligible at the same time).

Changing eligibility on delete seems error prone, so I think a no-op would be better.

I'm not quite sure I follow what you mean?

When I'm provisioning a Nomad worker with Terraform, currently I'm generating a UUID in Terraform and injecting it into the VM to be the Nomad worker UUID. Having a resource that depends on the VM resource that talks to Nomad and verifies the worker has come up and is marked as eligible (which should be the default for a newly joined agent) would be how I see it working. Then on destroy it would mark the worker ineligible and trigger a drain, once the drain completes it would be marked as done allowing for the next step in the destroy plan to complete (eg, destroying the VM).

I guess I see it more as a resource that manages the lifecycle of the Nomad worker, of draining on destroy is an optional parameter. The work during the create isn't just a no-op either since it would check that a Nomad agent came up with the provided UUID which verifies that the VM came up properly even if the amount of actual configuration it needs to do is minimal.

Would you be interested in working on a PR for them?

Definitely, if we can figure out a pattern for how to achieve the lifecycle in Terraform I'd be glad to make a PR implementing it.

nvx avatar Nov 04 '21 07:11 nvx

I guess I see it more as a resource that manages the lifecycle of the Nomad worker, of draining on destroy is an optional parameter. The work during the create isn't just a no-op either since it would check that a Nomad agent came up with the provided UUID which verifies that the VM came up properly even if the amount of actual configuration it needs to do is minimal.

Ahh now I see what you mean, thanks!

Usually resources in Terraform providers match 1:1 with a set of remote API CRUDs, but these endpoints (like /v1/node/:node_id/drain) are more transnational so they really create anything (well, they create an evaluation, but they are kind of useless in Terraform).

Your idea is that this resource would be like a sidecar to another resource, like an AWS EC2 instance. It would need a way to connect the two. You mentioned that you're generating UUIDs, but you will probably need an explicit depends_on to establish the destroy order. Another option would be to use the node metadata the Nomad automatically fingerprints in some cloud providers. That would create an explicit dependency.

If I understood correctly, this is an example of how this resource would be used:

resource "aws_instance" "nomad_client_1" {
  # ...
}

resource "nomad_node_manager" "nomad_client_1" {
  drain_on_destroy         = true
  drain_deadline           = "10m"
  drain_ignore_system_jobs = false
  drain_meta = {
    triggered_by = "Terraform"
  }

  eligible = true

  filter {
    type = "id"
    values = [ ... ]
  }
  
  filter {
    type   = "meta"
    key    = "unique.platform.aws.instance-id"
    values = [aws_instance.nomad_client_1.id]
  }
}

On create, this resource would wait until all filters provided match a registered node in the Nomad /v1/nodes API result. Once they are all up and ready, their IDs are store in the Terraform state.

On read, the resource would just check the IDs in the Terraform state to see if the nodes are still registered.

On update, the filters must be checked again, so very similar to create. But it would also check the eligible attribute and update the nodes accordingly. For example, you can see all nodes managed by this resource as ineligible by switching that attribute to false.

On destroy, the resource checks the values of drain_on_destroy. If true, the resource will trigger a drain in all nodes that it manages, following the provided drain_* attributes. Since this resource depends on others (either implicitly via referencing external IDs, or explicitly via depends_on), it would be destroy first by Terraform (I think 😅).

How does this sounds? Is it similar to what you had in mind?

lgfa29 avatar Nov 24 '21 23:11 lgfa29

How does this sounds? Is it similar to what you had in mind?

Yup that's exactly it.

I like your idea of using the node metadata to filter on it as well, would be handy where the cloud platform provides that information. A little more flexible than having to explicitly provision in the node ID ahead of time.

A timeout for how long to wait on the create for the Node to appear before considering it failed might be worth adding too.

nvx avatar Nov 29 '21 06:11 nvx

There's a standard way to handle timeouts in Terraform: https://www.terraform.io/docs/extend/resources/retries-and-customizable-timeouts.html

So it's mostly a matter of making sure that Timeouts is set in the resource schema and that the value is used wherever it's necessary. Here's an example of it being used in the nomad_job resource: https://github.com/hashicorp/terraform-provider-nomad/blob/main/nomad/resource_job.go#L305-L308

Would you still be interested in a submitting a PR with this spec?

lgfa29 avatar Dec 15 '21 01:12 lgfa29

There's a standard way to handle timeouts in Terraform: https://www.terraform.io/docs/extend/resources/retries-and-customizable-timeouts.html

So it's mostly a matter of making sure that Timeouts is set in the resource schema and that the value is used wherever it's necessary. Here's an example of it being used in the nomad_job resource: https://github.com/hashicorp/terraform-provider-nomad/blob/main/nomad/resource_job.go#L305-L308

Good to know

Would you still be interested in a submitting a PR with this spec?

Definitely. Just a matter of getting some free time to look at it. This time of year ends up being rather hectic unfortunately.

nvx avatar Dec 17 '21 00:12 nvx

No worries! We'll be around to help whenever you're ready to work on it 🙂

lgfa29 avatar Dec 21 '21 22:12 lgfa29