terraform-provider-nomad
terraform-provider-nomad copied to clipboard
Feature: Add ability to drain nodes
When provisioning nomad client nodes with Terraform, it would be useful to be able to use the Nomad terraform provider to gracefully drain (including waiting on the drain to complete) before destroying the VM.
I've gotten close by using a destroy time exec provisioner in a null_resource, but the problem is it is not possible to access a fresh Nomad ACL token. Having it as a feature of the provider fixes this issue (as well as reduces the brittleness inherent to exec provisioners) as providers can reference credentials from data sources directly that refresh during the plan.
I could see this being a resource that takes in the node id (either read out of the started VM, or by provisioning in the node ID into the VM using a random_uuid data source) that could optionally wait for the node to come online - technically not provisioning the node (the node registers itself after all) but would validate that the node came up properly which is an improvement on current workflows which have no way of checking the node has come up properly.
Then on destroy it drains the node and marks ineligible and waits for the drain to complete.
Hi @nvx 👋
This is an interesting, but I'm not sure if it fits well with the Terraform model 🤔
For example, what would it mean to delete this resource? We can't really "undo" a node drain so the resource would kind of just hang there?
These types of one-off operations seem better suited for the Nomad CLI.
For example, what would it mean to delete this resource? We can't really "undo" a node drain so the resource would kind of just hang there?
Marking as eligible and ineligible for scheduling is a reversible operation, I'd envisage triggering a drain as an option when marking ineligible.
While the Nomad CLI can be used for this, it means that the lifecycle of a Nomad worker can no longer be maintained through Terraform alone as you'd need to drain a worker manually before having Terraform do the rest of the lifecycle management which can get a bit onerous when doing immutable blue/green deployments where destroying resources is a BAU activity.
We've tried to emulate it using destroy time exec provisioners to call the Nomad CLI but it gets super ugly trying to access credentials to perform the operation (our Nomad tokens are read out of Vault and short lived as per Terraform best practices to reduce the risk of secrets in the state file).
Marking as eligible and ineligible for scheduling is a reversible operation
That's a different endpoint, so probably another resource altogether 🙂
So nomad_node_drain
would use /v1/node/:node_id/drain
and nomad_node_elibigility
would use /v1/node/:node_id/eligibility
. Both endpoints generate an evaluation in Nomad, so I guess that's the resource being "created"?
Changing eligibility on delete seems error prone, so I think a no-op would be better.
Would you be interested in working on a PR for them?
Marking as eligible and ineligible for scheduling is a reversible operation
That's a different endpoint, so probably another resource altogether 🙂
So
nomad_node_drain
would use/v1/node/:node_id/drain
andnomad_node_elibigility
would use/v1/node/:node_id/eligibility
. Both endpoints generate an evaluation in Nomad, so I guess that's the resource being "created"?
I just had a closer look what the Nomad CLI is doing (to date I've been draining nodes manually), and it looks like the logic is enabling drain on a node sets the eligibility to false by default (but you can also set the eligibility to false without doing a drain), and cancelling an existing drain changes the eligibility back to true by default (but you can also do this separately to the drain as well, or cancel a running drain without changing the eligibility back).
I guess in my head I had conflated the drain and eligibility flags together since normally I'm dealing with them together (indeed I don't think you can start a drain on a host without also marking it ineligible at the same time).
Changing eligibility on delete seems error prone, so I think a no-op would be better.
I'm not quite sure I follow what you mean?
When I'm provisioning a Nomad worker with Terraform, currently I'm generating a UUID in Terraform and injecting it into the VM to be the Nomad worker UUID. Having a resource that depends on the VM resource that talks to Nomad and verifies the worker has come up and is marked as eligible (which should be the default for a newly joined agent) would be how I see it working. Then on destroy it would mark the worker ineligible and trigger a drain, once the drain completes it would be marked as done allowing for the next step in the destroy plan to complete (eg, destroying the VM).
I guess I see it more as a resource that manages the lifecycle of the Nomad worker, of draining on destroy is an optional parameter. The work during the create isn't just a no-op either since it would check that a Nomad agent came up with the provided UUID which verifies that the VM came up properly even if the amount of actual configuration it needs to do is minimal.
Would you be interested in working on a PR for them?
Definitely, if we can figure out a pattern for how to achieve the lifecycle in Terraform I'd be glad to make a PR implementing it.
I guess I see it more as a resource that manages the lifecycle of the Nomad worker, of draining on destroy is an optional parameter. The work during the create isn't just a no-op either since it would check that a Nomad agent came up with the provided UUID which verifies that the VM came up properly even if the amount of actual configuration it needs to do is minimal.
Ahh now I see what you mean, thanks!
Usually resources in Terraform providers match 1:1 with a set of remote API CRUDs, but these endpoints (like /v1/node/:node_id/drain
) are more transnational so they really create anything (well, they create an evaluation, but they are kind of useless in Terraform).
Your idea is that this resource would be like a sidecar to another resource, like an AWS EC2 instance. It would need a way to connect the two. You mentioned that you're generating UUIDs, but you will probably need an explicit depends_on
to establish the destroy order. Another option would be to use the node metadata the Nomad automatically fingerprints in some cloud providers. That would create an explicit dependency.
If I understood correctly, this is an example of how this resource would be used:
resource "aws_instance" "nomad_client_1" {
# ...
}
resource "nomad_node_manager" "nomad_client_1" {
drain_on_destroy = true
drain_deadline = "10m"
drain_ignore_system_jobs = false
drain_meta = {
triggered_by = "Terraform"
}
eligible = true
filter {
type = "id"
values = [ ... ]
}
filter {
type = "meta"
key = "unique.platform.aws.instance-id"
values = [aws_instance.nomad_client_1.id]
}
}
On create, this resource would wait until all filters provided match a registered node in the Nomad /v1/nodes
API result. Once they are all up and ready, their IDs are store in the Terraform state.
On read, the resource would just check the IDs in the Terraform state to see if the nodes are still registered.
On update, the filters must be checked again, so very similar to create. But it would also check the eligible
attribute and update the nodes accordingly. For example, you can see all nodes managed by this resource as ineligible by switching that attribute to false
.
On destroy, the resource checks the values of drain_on_destroy
. If true
, the resource will trigger a drain in all nodes that it manages, following the provided drain_*
attributes. Since this resource depends on others (either implicitly via referencing external IDs, or explicitly via depends_on
), it would be destroy first by Terraform (I think 😅).
How does this sounds? Is it similar to what you had in mind?
How does this sounds? Is it similar to what you had in mind?
Yup that's exactly it.
I like your idea of using the node metadata to filter on it as well, would be handy where the cloud platform provides that information. A little more flexible than having to explicitly provision in the node ID ahead of time.
A timeout for how long to wait on the create for the Node to appear before considering it failed might be worth adding too.
There's a standard way to handle timeouts in Terraform: https://www.terraform.io/docs/extend/resources/retries-and-customizable-timeouts.html
So it's mostly a matter of making sure that Timeouts
is set in the resource schema and that the value is used wherever it's necessary. Here's an example of it being used in the nomad_job
resource:
https://github.com/hashicorp/terraform-provider-nomad/blob/main/nomad/resource_job.go#L305-L308
Would you still be interested in a submitting a PR with this spec?
There's a standard way to handle timeouts in Terraform: https://www.terraform.io/docs/extend/resources/retries-and-customizable-timeouts.html
So it's mostly a matter of making sure that
Timeouts
is set in the resource schema and that the value is used wherever it's necessary. Here's an example of it being used in thenomad_job
resource: https://github.com/hashicorp/terraform-provider-nomad/blob/main/nomad/resource_job.go#L305-L308
Good to know
Would you still be interested in a submitting a PR with this spec?
Definitely. Just a matter of getting some free time to look at it. This time of year ends up being rather hectic unfortunately.
No worries! We'll be around to help whenever you're ready to work on it 🙂