terraform-provider-nomad
terraform-provider-nomad copied to clipboard
Terraform fails to notice when a Nomad job has changed
This issue was originally opened by @blalor as hashicorp/terraform#14038. It was migrated here as part of the provider split. The original body of the issue is below.
Terraform Version
Terraform v0.9.3
Affected Resource(s)
- nomad_job
Terraform Configuration Files
provider "nomad" {
address = "http://localhost:4646"
}
resource "nomad_job" "example" {
jobspec = "${file("${path.module}/example.nomad")}"
}
Debug Output
Log for 2nd terraform apply
: apply.log.txt
Console output:
$ TF_LOG=TRACE TF_LOG_PATH=apply.log.txt terraform apply
nomad_job.example: Refreshing state... (ID: example)
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Expected Behavior
Terraform should have noticed that the job had changed in the cluster and updated it.
Actual Behavior
Nuttin', honey.
Steps to Reproduce
-
nomad agent -dev
-
terraform apply
-
nomad status example
shows one allocation for task groupgrp
-
change task group count:
curl -sfS localhost:4646/v1/job/example | \ jq '.TaskGroups[0].Count = 2 | {"Job": .}' | \ curl -is -X PUT -d@- localhost:4646/v1/job/example
-
nomad status example
shows two allocations forexample.grp
-
terraform plan
shows no change -
terraform apply
makes no change -
nomad status example
still shows two allocations
@paddycarver / @grubernaut / @radeksimko, is this a verified issue/bug? I'm seeing a slightly different issue, but not sure if it's the same as this. If a job has been stopped, or is not running (but the spec has not changed), then the TF nomad provider should update nomad. It might be best to always have the job sent to nomad, and let nomad work out issues with differences in the spec? This issue makes the provider unusable for the most basic use cases.
The underlying issue here also produces the following behavior:
- run a job with the terraform provider
-
nomad stop
that job manually - re-apply the terraform plan, the job is still stopped
Hi @ketzacoatl! I can't say for sure whether this is a verified bug, nor can I explain the behaviour. I'll try to look into this soon and come back with a bit more information, and a fix if necessary. Apologies for the delays on this.
hi @paddycarver, thanks for taking a look! were you able to confirm the behavior we see in practice?
My observation is the same as @ketzacoatl.
It would be just more operator friendly if users could interact via Terraform, rather than nomad status
My versions are Terraform 0.11.5 Nomad (built from master) (0.8.0-dev)
Any updates on this @paddycarver?
Also, a question: Does the Terraform provider always submit the job to nomad, or does it decide whether or not it should?
@paddycarver I took a quick look inside the codebase, and it seems the culprit is:
-
resourceJobRead
is not implemented - there is no metadata to discern changes other than job ID within the schema
- de-then-registering only occurs when ID itself changes
#15 by @apparentlymart adds more metadata, but modify_index
wouldn’t pick up changes from the provider. I think “Solution 1” below is a nice way of dealing with it.
Solution 1: add status
computed metadata to the schema, plus #15 other items
- Upon read, check with
job.Stopped()
whether it should be re-registered - Notify through metadata if stopped that recreation is necessary
Pros
- Smarter information for
plan
- Will always restart a stopped job according to configuration
Cons
-
plan
now has to read more information before, possibly slowing it down - More changes to the codebase
Solution 2: Always de-register and re-register
- Refactor
resourceJobRegister
so that it always de-then-re-register, regardless of configuration change or not. - Always mark resource job specs as tainted
Pros
- Easy to add
- Never leads to stale jobs not being picked up
- Retains insanely fast
plan
with Nomad jobs
Cons
- Users probably do not expect that to happen if they did not change their configurations
- Ugly plans with unnecessary information
@paddycarver, I'd love to help resolve this problem, as IMO it prevents serious use of Terraform to manage nomad, do you have any guidance or recommendations on how you would like to address the problem?
cc @katbyte / @radeksimko
@mitchellh I'd be very grateful for your feedback/guidance here.
@cgbaker, With renewed development efforts here, what are your thoughts on this issue?
Hi @ketzacoatl, I wholeheartedly agree that this is something that must be addressed. This shortcoming makes the Nomad provider just about useless for Day 2 operations.
As noted in #15 , there are a few different options for addressing this. The solution partially hinges on whether we want to try to find a general solution that doesn't require modifying and re-releasing the provider every time Nomad releases a new version. On the other hand, with the Nomad product team taking ownership of this TF provider, we can potentially address such a workflow a little better than before. And it may be prudent to find a temporary solution to this issue while we work on a generic/unversioned provider.
The Nomad team is committed to addressing this; I will post an update here soon giving an idea as to the timetable, but my intention is to either resolve this issue as part of the upcoming 1.3.0 version of the provider (targeting Nomad 0.8.x) or the 1.4.0 version (targeting Nomad 0.9.x).
Thank you for your patience and persistence on this issue.
And it may be prudent to find a temporary solution to this issue while we work on a generic/unversioned provider.
That would be great, WRT being able to continue using the provider while future improvements get worked out.
Nomad 0.8.x API support is available in the Nomad v1.3.0 that released today. I will look at finding a longer-term solution for this in upcoming versions; having said that, even if we continue version the Nomad providers, we pledge to be much more responsive in updating the Nomad provider going forward.
The Nomad team is committed to addressing this; I will post an update here soon giving an idea as to the timetable, but my intention is to either resolve this issue as part of the upcoming 1.3.0 version of the provider (targeting Nomad 0.8.x) or the 1.4.0 version (targeting Nomad 0.9.x).
Hello, just checking back up on this - have there been any improvements related to this?
No update as of now, we've been focusing on core Nomad work, finishing the 0.9.0 release.
On Thu, Apr 4, 2019 at 1:02 AM ketzacoatl [email protected] wrote:
The Nomad team is committed to addressing this; I will post an update here soon giving an idea as to the timetable, but my intention is to either resolve this issue as part of the upcoming 1.3.0 version of the provider (targeting Nomad 0.8.x) or the 1.4.0 version (targeting Nomad 0.9.x).
Hello, just checking back up on this - have there been any improvements related to this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraform-providers/terraform-provider-nomad/issues/1#issuecomment-479750800, or mute the thread https://github.com/notifications/unsubscribe-auth/ABmPT9wkf1S_J33gI6wrFz0StYLsaQTuks5vdYdQgaJpZM4N5Ctd .
Fair enough, thanks for the update!
Update: this was not resolved in the 1.4.0 release of the provider, but we're still tracking this.
I am stoked to see this on the 1.5 milestone, rock on!
It have been a year since the last comment. Any update on this?
@cgbaker WRT the roadmap, are there refactors or internal changes that block fixing this properly?
we were waiting for the update 2.0 plugin SDK to see if it helped deal with this. we're actively looking at it now; we want this issue dealt with in the next few months as part of the Nomad 1.0 milestone.
Hi everyone,
This issue has been going for a while, but we are still actively tracking it. I also think we are overdue for a more thorough update, so here is a description of the problem and what we've tried so far (and why it didn't work).
Why this is happening
This issue happens because the job resource doesn’t have separate attributes for each possible field in the jobspec. Instead, it receives a jobspec as a string (or a path to file that is read into a string), and parses some of the fields into computed attributes.
Since all attributes are computed, Terraform is not able to automatically detect changes, so the provider needs a customized diff function to properly communicate to Terraform if anything has changed. Unfortunately Terraform doesn’t provide the current value stored in the state for computed fields to the diff function, so it can't actually detect changes.
What we've tried so far
Here are some of approaches we've tried and why they didn't work.
Declare all jobspec fields as attributes
Having the jobspec as the single attribute forces us to do Terraform’s job of computing the diff, which, as described above, we can’t do properly. Having the full jobspec described as the resource schema would be more in-line with the way Terraform works.
But even though both Nomad and Terraform use HCL, the structures supported by each of them are different, thus creating a new syntax for defining jobs that is not compatible with the nomad job run
command nor the Nomad API.
For example, in Nomad a group is defined as:
job "example" {
group "cache" {
...
}
}
In Terraform this job would be declared as:
resource "nomad_job_new" "example" {
name = "example"
group {
name = "cache"
}
}
While having the entire jobspec as the single input for the resource is not ideal, it does provide a nice user experience, where the job is defined only once and can be reused in many different places.
You can check the code for this attempt here.
Having jobspec fields defined as optional attributes
If we keep the jobspec
attribute but also have the other fields allowed in the job as optional attributes, we could keep the nice user experience of being able to reuse job files but also define some of the values as proper Terraform attributes.
Unfortunately this doesn’t work well because if an attribute is not marked as computed
the provider is not able to set a value for it, so anything that is defined in the jobspec but not in the Terraform resource would not be tracked.
Having jobspec fields defined as optional and computed attributes
Terraform does allow attributes to be optional
and computed
at the same time. But this falls back to the original issue where the current state value for computed attributes are not available to customized diff functions.
I hope this explanation was helpful, and we will provide more updates as we go. In the meantime feel free to sends us any questions, comments or ideas.
Thanks for the update @lgfa29, it certainly gives clarity on why this is not trivial.
Just curious if you have a current working idea of how this should be addressed or if it's still being explored?
Yes, we are still actively looking for a solution. I will make sure to post new updates, even for failed attempts.
Hi, I've not yet tried it but why not leverage the diff abilities from Nomad to make the diff? Either by exposing it in the Nomad client or adding an endpoint that would return the diff between the new job and the current one (but not actually submitting it).
hi @remilapeyre , we already do that for CustomizeDiff
:
https://github.com/terraform-providers/terraform-provider-nomad/blob/master/nomad/resource_job.go#L518
the issue is that the current Terraform plugin SDK doesn't care about all those computed
fields that we get from the Nomad API... we would need to update the jobspec
. there are some options there, where we use it as a sentinel, which we're looking into. and we've had some discussion with the Terraform plugin SDK team. We are pursuing a few different options, one of which may just be a new, structured nomad_job
resource which doesn't have these problems.
@lgfa29 I like the progress in that PR, and I would love to see that. I have zero qualms as a user with cutting off a major version and breaking a lot of compatibility. We already do this today with the Vault provider where some of our modules depend on older versions and others on newer versions.
I am a BIG! +1 on
resource "nomad_job_new" "example" {
name = "example"
group {
name = "cache"
}
}
I can see how I could modularize this already and produce a terraform module to be consumed upstream from my team with a ton of defaults (e.g.: add sidecars automatically into folks tasks)
Thank you @cgbaker and @lgfa29 for the update. This explains a lot, I had wondered why Nomad's diff features weren't being used here and I did not understand the code itself super well.
Thanks for the feedback @chuckyz. Creating a new resource seems to be the best approach we have right now, so we will move forward with that (but with a better name than nomad_job_new
😄).
Thanks for the feedback @chuckyz. Creating a new resource seems to be the best approach we have right now, so we will move forward with that (but with a better name than
nomad_job_new
smile).
Would it make sense to incorporate it in #149?
Thanks for the feedback @chuckyz. Creating a new resource seems to be the best approach we have right now, so we will move forward with that (but with a better name than
nomad_job_new
smile).Would it make sense to incorporate it in #149?
Yup. #149 is being developed exactly to solve this issue 👍