nomad
nomad copied to clipboard
Add arbitrary claims to a job's workload identity
Exposing Nomad workload identity JWTs inside jobs greatly enhances the flexibility that operators have when configuring jobs to authenticate to external systems like Vault, Consul etc. Currently, a job's identity JWT is limited to 4 main claims: nomad_namespace, nomad_job_id, nomad_allocation_id, nomad_task. While a good start, even more flexibility could be obtained if an operator could configure Nomad to add arbitrary keys from the jobspec as claims to a job's identity JWT. Even allowing a subset of the jobspec keys, especially the "Meta" key, would be good enough.
Proposal
Within the new identity block, allow an operator to specify arbitrary keys from the jobspec that would be added as claims to a job's JWT. The primary use case for me would be to have the contents of the "Meta" block for a job baked into the JWT. This would allow dramatically more flexibility when using templated Vault policies, since a richer source of metadata would be available for use. Example jobspec:
job "docs" {
group "example" {
task "api" {
####
other stuff
####
identity {
name = "example"
aud = ["oidc.example.com"]
file = true
ttl = "1h"
extra_claims = ["job.meta"] <----------------- some suitable way to identify keys of interest
}
}
}
meta {
foo = bar
}
}
Would give the following JWT:
{
"aud": "oidc.example.com",
"exp": 1702121612,
"iat": 1702118012,
"jti": "8a5f5fb9-2a0c-a4f7-7583-b75c4a1b0766",
"nbf": 1702118012,
"nomad_allocation_id": "c30140d4-6106-2350-ff90-6cc9a3b9e7ab",
"nomad_job_id": "docs",
"nomad_namespace": "default",
"nomad_task": "api",
"sub": "foo",
"job.meta": {
"foo": "bar" <-------------------------- and it comes out as a claim
}
}
Use-cases
- Have much more control over using Vault policies that are templated with metadata claims.
- Provide richer information to external systems by securely passing the JWTs from Nomad jobs.
Attempted Solutions
There's no real way to satisfy this requirement using existing Nomad features. This functionality requires cryptographic guarantees from the workload identity engine itself.
Hi @vftaylor! As it turns out, we've had this same request through various backchannels. The main thing that looks tricky with this is making sure that job submitters can't use this to escalate privileges in a way the cluster administrator doesn't expect. So we're talking through the feasibility.
The main thing that looks tricky with this is making sure that job submitters can't use this to escalate privileges in a way the cluster administrator doesn't expect.
@tgross I agree with the concern. Several thoughts:
- Before workload identity, a job submitter could escalate privileges in Vault (for example) by specifying arbitrary Vault policies in the jobspec. Is this scenario a lot different?
- To mitigate the privilege escalation concern, you could make it so that "customised" workload identities are only allowed in tokens with specific
audclaims. That way, the ultimate recipient of the token has a way to distinguish super-trustworthy vs. semi-trustworthy tokens.
Was requested to come discuss here; currently our existing way of doing things would be covered by having multiple roles that receive different policies, but the only real distinguishing feature in there is application names (i.e. every application currently has 1 policy for itself, and 1 policy for each database it needs to access); this does translate back to jwt roles (given that I can compose the role out of several policies).
However, we do at this point in time use a separate service to authenticate apps against eachother, this service also talks to Nomad to extract a few things from the job's meta block, where we define such fun things as "who owns this app", and some other stuff I don't want to discuss openly (secrets... terrible, terrible secrets - but not that secret since they're in a jobspec file :P). Ideally I would love to be able to declare as a configuration option (marked as "if you do this, you could open yourself up to privilege escalation and other such fun things" in the docs, obviously) that I can either copy the entire meta block of a jobspec into a meta key in the claim mappings, or only a subset - just so we can get rid of that service-auth-service thing.
I figure the easiest bit would be to just flat out copy the meta block.
As far as privilege escalation goes, though, I don't see how this would allow for it unless specifically implemented on the consuming end; in our case, we want to use the workload identity to get rid of said service that authenticates services to eachother and use the identity, a consuming service won't blindly accept the claim mappings - at least, it shouldn't, but I've got my developers trained right and the attitude adjustment stick is never far away. Also with the way we use Nomad, nobody gets to submit jobs, it goes through an API where I do a painful amount of scrubbing on the supplied data before it generates a job, so just in my (super special snowflake maybe) case, it's a trivial non-issue. Can't say if that floats for other people, though...
And what @vftaylor said, you could already do this by specifying additional policies in the vault.policies list, unless you're thinking of something entirely different re: escalation.
I'm also waiting for these extra claims too. That's a must need. @benvanstaveren when you are using workload identity, you cannot use vault.policies. You will see a warning like:
Job Warnings:
1 warning:
* Task xxxx has a Vault block with policies but uses workload identity to authenticate with Vault, policies will be ignored
I'm also waiting for these extra claims too. That's a must need. @benvanstaveren when you are using
workload identity, you cannot usevault.policies. You will see a warning like:Job Warnings: 1 warning: * Task xxxx has a Vault block with policies but uses workload identity to authenticate with Vault, policies will be ignored
I know I can't use vault.policies - I'm just illustrating how we do it now. I don't honestly think we'll be upgrading past 1.8 if the current implementation of workload identity stays the way it is and the docs aren't updated to very explicitly and clearly explain how to get it set up on multiple federated clusters and still keep the ability to emergency-schedule things on a different cluster.
@benvanstaveren you need to use vault.role instead. Then create the role into your jwtpath/role. In this role associate the policy that you want. I've been able to do that since maybe 3 weeks and it works. I think vault.role came in version 1.7.4 ... I don't remember.
The main thing that looks tricky with this is making sure that job submitters can't use this to escalate privileges in a way the cluster administrator doesn't expect. So we're talking through the feasibility.
For a lot of use-cases, having additional claims somewhere would suffice, even if it's in a mandatory metadata field that we cannot 'escape' out of.
{
"aud": "oidc.example.com",
"exp": 1702121612,
"iat": 1702118012,
"jti": "8a5f5fb9-2a0c-a4f7-7583-b75c4a1b0766",
"nbf": 1702118012,
"nomad_allocation_id": "c30140d4-6106-2350-ff90-6cc9a3b9e7ab",
"nomad_job_id": "docs",
"nomad_namespace": "default",
"nomad_task": "api",
"metadata": {
"foo": "bar"
}
}
That should prevent privilege escalation, while still enabling access to custom values.
Our use-case:
- Nomad generates identity token for Vault
- Vault parses this identity token and saves whatever it can as
user_claimsinto themetadataof that auth alias - Vault generates a OIDC-token for the Nomad task (secret), which makes use of the claims that were saved into the alias
metadata
But nowhere does Vault allow us to (for example) add a prefix/suffix to it. Not even capitalization. So enabling Nomad to specify more useful values to the feature-lacking Vault would help a lot.
Then again, if Nomad allows custom values at root level, we could forego Vault for our use-case and then we can make our application (SurrealDB) just accept the identity tokens that Nomad generates directly.
Before workload identity, a job submitter could escalate privileges in Vault (for example) by specifying arbitrary Vault policies in the jobspec.
How does this work? As far as my experience goes, this is not possible, because you must pass a VAULT_TOKEN with the correct policies when you are attempting to schedule a job with arbitrary Vault policies.
How does this work? As far as my experience goes, this is not possible, because you must pass a VAULT_TOKEN with the correct policies when you are attempting to schedule a job with arbitrary Vault policies.
Workload Identity removes the requirement for users to submit the Vault token, in lieu of creating a trust relationship between Nomad and Vault such that Nomad gets the Vault token via signed Workload Identities. That's entirely the purpose of WI.
Thanks, my question referred to the state before WI though. I don't see how the privilege escalation would work right now, because as far as I know I need to provide a token which already proves that I have access to a certain policy.
Workload identities weaken this assumption it seems, as a user I can now schedule any workload and get whatever the job identity might have access to, unless I am misunderstanding:
So before WI a user would have to:
- specify arbitrary vault policies
- get a vault token with those policies
- present this token to nomad to prove they are allowed to request these policies and pass them to the job
With WI:
- A user can specify one pre-defined role
- Does not need to prove they are allowed to schedule a job with access to this role, they just schedule it (given they are allowed to schedule a job in the given namespace)
How would a user prove that they are allowed to schedule that particular workload? The only two things I can come up with are:
- namespace policies
- sentinel policies
Namespaces are extremely inflexible, we would need to create namespaces per role to limit access properly (and create hundreds of roles to allow the same flexibility we had before with policies). Sentinel policies are an enterprise feature which our organization cannot afford right now, so they are not an option.
I don't see any other ACLs listed in https://developer.hashicorp.com/nomad/tutorials/access-control/access-control-create-policy#write-the-policy-rules which would allow to control this.
This would not be resolved with arbitrary claims added from a user side either, but it might possibly be resolved with Nomad adding, e.g., a user identity, or roles and groups derived from Nomad token.
Thanks, my question referred to the state before WI though. I don't see how the privilege escalation would work right now, because as far as I know I need to provide a token which already proves that I have access to a certain policy.
Workload identities weaken this assumption it seems
Right. Namespace policies, Sentinel policies, and templated Vault policies on Vault roles (example from tutorial) are the intended controls in the Workload Identity workflow.
Namespaces are extremely inflexible, we would need to create namespaces per role to limit access properly (and create hundreds of roles to allow the same flexibility we had before with policies)
There's definitely a bit more up-front work involved, but the resulting workflow for job authors is easier (they don't need a Nomad token, Vault token, and Consul token to submit a job).
In any case, this discussion isn't about whether we're going to have users submitting their own Vault tokens (which as published late last year is the legacy workflow we're removing in Nomad 1.10). This discussion is around features intended to add additional flexibility so that cluster admins have finer-grained control over policies, just as you're suggesting you'd want. An alternate proposal is over in https://github.com/hashicorp/nomad/issues/23510.
I wanted to follow-up on this issue with something we've discussed over in #23510. From my most recent comment there (https://github.com/hashicorp/nomad/issues/23510#issuecomment-2256865419):
Ok, so @schmichael and I had a chat and I think we've settled on the idea of introducing a extra claims block that accepts template strings in the server configuration. So in the Vault block you'll do something like this:
vault {
address = "https://vault.example.com:8200"
enabled = true
default_identity {
aud = ["vault.io"]
ttl = "1h"
extra_claims {
nomad_workload_id = "${job.namespace}:${job.id}"
some_other_claim = "foo"
}
}
}
We'll need to do a little investigation to see the exact objects we can expose in those templates, but that's the gist of things.
This allows us to avoid adding lots more claims to the JWT that some users might not need, while giving cluster admins the flexibility they need to meet their requirements for controls. We'll also probably want to add the same feature for a top-level
server.default_identity, but we can do that in follow-up work. That'll cover a lot of the remaining use cases described in https://github.com/hashicorp/nomad/issues/19438.
I think that'll mostly cover what folks want to do here, and if there are use cases leftover, we can discuss what the best way forward is within that context.
vault {
address = "https://vault.example.com:8200"
enabled = true
default_identity {
aud = ["vault.io"]
ttl = "1h"
extra_claims {
nomad_workload_id = "${job.namespace}:${job.id}"
some_other_claim = "foo"
}
}
}
Even though it's not mentioned in the release notes, I believe this is released as part of v1.8.3 https://github.com/hashicorp/nomad/pull/23675 :tada:
Almost! Only vault.default_identity can have extra_claims as of 1.8.3, which we think will get folks very far. We're keeping this issue open for a more broadly-reaching ability to add these.
Are there any plans for making extra_claims available at the job level? Our use case involves review apps, which are named something like <project-name>-<branch-name>. Currently with WI we would need to have one entry in vault per review app instead of all review apps for a project being able to share a single entry. It would be great if we could define a static string to a claim (such as parent_project = "foo") within the job to pass to the templated policy in vault.
This issue covers exactly that kind of thing, but it's still unclear how we want to expose that in a way that doesn't allow for easy privilege escalation. The vault.extra_claims we added protects your from having job authors creating arbitrary claims by templating to a narrow subset of fields. But there's currently no option there to do so from other metadata beyond the interpolations shown there in the docs.
I suspect the answer will be to let cluster admins add claim templates for keys in meta blocks. So for example that would let you have a jobspec with:
meta {
project = "example"
branch = "final-final-2"
}
And then have a vault.extra_claims template like:
vault {
extra_claims = "${meta.project}-${meta.branch}"
}
In any case, this isn't currently on the roadmap and is still under discussion. I am going to mark it so that it can get roadmapped though, as it seems like a valuable project.
Looking at what's required to get off the legacy Vault integration and over to workload identities properly, we're going to need something like this, as we've got several related tasks that all end up with different job IDs/parent job IDs: my-service, my-service-cron, my-service-before-task, which we all want to be able to read from a a single, interpolated path in Vault: /kv/my-service/..
The changes to support something similar to what was discussed above with job level metadata look deceptively simple to me -- note I haven't tested them beyond updating + running the tests, and haven't looked to ensure the workload identities are properly regenerated when jobs are updated, etc. Is this somewhat on the right track if I were to properly test + submit a contribution, or does HashiCorp have a more refined suggestion for how we might go about implementing this? (cc @tgross)
diff --git a/nomad/structs/workload_id.go b/nomad/structs/workload_id.go
index f2b2b0c771..50cd9bc4da 100644
--- a/nomad/structs/workload_id.go
+++ b/nomad/structs/workload_id.go
@@ -232,7 +232,7 @@ func (b *IdentityClaimsBuilder) interpolate() {
return
}
- r := strings.NewReplacer(
+ replacements := []string{
// attributes that always exist
"${job.region}", b.job.Region,
"${job.namespace}", b.job.Namespace,
@@ -250,7 +250,18 @@ func (b *IdentityClaimsBuilder) interpolate() {
"${vault.cluster}", strAttrGet(b.vault, func(v *Vault) string { return v.Cluster }),
"${vault.namespace}", strAttrGet(b.vault, func(v *Vault) string { return v.Namespace }),
"${vault.role}", strAttrGet(b.vault, func(v *Vault) string { return v.Role }),
- )
+ }
+
+ // job metadata can also be interpolated
+ if b.job != nil && b.job.Meta != nil {
+ for key, value := range b.job.Meta {
+ metaKey := fmt.Sprintf("${job.meta.%s}", key)
+ replacements = append(replacements, metaKey, value)
+ }
+ }
+
+ r := strings.NewReplacer(replacements...)
+
for k, v := range b.extras {
b.extras[k] = r.Replace(v)
}
diff --git a/nomad/structs/workload_id_test.go b/nomad/structs/workload_id_test.go
index af455bc96b..50effb4eff 100644
--- a/nomad/structs/workload_id_test.go
+++ b/nomad/structs/workload_id_test.go
@@ -25,6 +25,11 @@ func TestNewIdentityClaims(t *testing.T) {
Namespace: "default",
Region: "global",
+ Meta: map[string]string{
+ "meta1": "value1",
+ "meta2": "value2",
+ },
+
TaskGroups: []*TaskGroup{
{
Name: "group",
@@ -236,6 +241,8 @@ func TestNewIdentityClaims(t *testing.T) {
},
ExtraClaims: map[string]string{
"nomad_workload_id": "global:default:parentJob",
+ "nomad_meta1": "value1",
+ "nomad_meta2": "value2",
},
},
"job/group/task/services/task-service": {
@@ -285,6 +292,8 @@ func TestNewIdentityClaims(t *testing.T) {
},
ExtraClaims: map[string]string{
"nomad_workload_id": "global:default:parentJob",
+ "nomad_meta1": "value1",
+ "nomad_meta2": "value2",
},
},
// Use task-level Consul namespace for task services.
@@ -358,6 +367,8 @@ func TestNewIdentityClaims(t *testing.T) {
},
ExtraClaims: map[string]string{
"nomad_workload_id": "global:default:parentJob",
+ "nomad_meta1": "value1",
+ "nomad_meta2": "value2",
},
},
// Use group-level Consul namespace for task service because task
@@ -409,6 +420,8 @@ func TestNewIdentityClaims(t *testing.T) {
},
ExtraClaims: map[string]string{
"nomad_workload_id": "global:default:parentJob",
+ "nomad_meta1": "value1",
+ "nomad_meta2": "value2",
},
},
// Use task-level Consul namespace for task services.
@@ -523,6 +536,8 @@ func TestNewIdentityClaims(t *testing.T) {
WithConsul().
WithVault(map[string]string{
"nomad_workload_id": "${job.region}:${job.namespace}:${job.id}",
+ "nomad_meta1": "${job.meta.meta1}",
+ "nomad_meta2": "${job.meta.meta2}",
}).
Build(now)
Hi @chrisboulton! There's really two intertwined proposals under discussion here:
- Adding job metadata to the interpolation targets available for
vault.extra_claimstemplates. - Adding arbitrary claims to the
identityblock controlled by the job.
What you're attempting here is (1) and I think we'd have a lot less hesitancy about accepting that than any progress towards (2). Proposal (2) has more security complexity and we just need to do some thinking about it and admittedly haven't got around to it.
As for your implementation for (1), that looks roughly right. In the tests, I'd probably have only a single metadata key in the WithVault arguments just to ensure we'd not interpolating metadata that's unexpected (i.e. keep meta2 in the jobspec but have only meta1 in the extra claims). If you're interested in submitting a PR I think we'd be happy to get that reviewed.
I'll chime in with my 2 cents again because this has become something that's still blocking us from ditching our homebrew duct tape and bailing wire solution via yet-another-app; I don't see the security implications in allowing additional arbitrary claims to be added to an identity token (that isn't used for vault or consul) - after all it's the job of the consuming application to validate the token and it's claims.
And that's fine; because realistically I don't want Nomad telling me what I can and can't do (from a security standpoint) with the jobs I run. Assume that if I choose to add additional claims that I've done my risk analysis and I'm aware of any potential pitfalls. If Nomad wishes to tell me what I should and shouldn't do, that's fine, but at least allow me to tell it to shut up and do as it's told.
As I mentioned before, gate it behind a client configuration setting that is off by default, and has to be explicitly enabled in order for additional claims to work - at that point it's very much like "you opted in to this, so if it breaks, you get to keep both pieces".
I don't see the security implications in allowing additional arbitrary claims to be added to an identity token (that isn't used for vault or consul) - after all it's the job of the consuming application to validate the token and it's claims.
I have a job "example" in namespace "developer" and I have an ACL token with namespace "developer" { policy = "write" }. So you're ok with me having an identity that says:
identity {
extra_claims {
nomad_namespace = "production"
nomad_job = "foo"
}
}
Probably not? Under the Nomad security model personas we expect that a cluster administrator does want tight control over what job authors ("Nomad operator" in that link) can deploy. We need to design those controls:
- Is this just an on/off boolean feature for the whole cluster?
- Should cluster admins be able to provide allowlist/denylist rules for what claims can be made?
- Can those rules have globs?
- Are those rules opt-in or opt-out?
- Should we reserve key prefixes in claims, so that we can expand the set of claims Nomad writes in the future?
- Should cluster admins be able to assign those rules per-namespace? Per-node pool?
- Should any of these controls be an Enterprise-only feature? If so, what does the reduced version of the control look like in the CE product?
There's a bunch of design work to do and getting it wrong means screwing up a security component. If you don't need those controls that's fine, but have some empathy that lots of folks will. You're always welcome to contribute specific ideas or patches.
As I mentioned before, gate it behind a client configuration setting
It would have to be in the server-side control, not the client. Otherwise the same allocation deployed to different nodes would have different claim values (or just fail, depending on how it was implemented).
So you're ok with me having an identity that says:
No... and that's an awful example because if whoever implements this feature allows for extra arbitrary claims to overwrite the claims nomad generates, well... I would have a fair few choice things to say about that :)
And as far as the list of design bullets go, a fair few of those hinge on just allowing people to willy-nilly declare additional claims, which would be... cough not smart. Of course you want to prevent anything from clobbering existing claims, which means that at least 2 out of the 3 bullets are a "yes" - and whether or not to make it an EE feature, well, no?
And yes, server-side control for the config setting of course.