nomad icon indicating copy to clipboard operation
nomad copied to clipboard

raise an error if group has Consul namespace set for non-Enterprise cluster

Open wimax-grapl opened this issue 2 years ago • 3 comments

Nomad version

Output from nomad version 1.3.5

Operating system and Environment details

Ubuntu 22

Issue

If you have a Group with the following definition:

group {
  consul { namespace = var.some_namespace } 
  service "generator-plugin" {}
  service "generator-execution-sidecar" {}
}

I would expect every instance of this group to exist in a different namespace (at least in Enterprise). However: My cluster isn't enterprise yet, but Nomad happily accepts this job and then dumps these services into the only (default) namespace. image This causes an unexpected round-robin situation.

When using the Consul CLI it gives you a pretty harsh error if you try to do anything namespace-related; I'd hope that Nomad's interface with this would try to do the same.

consul catalog services -namespace default
Error listing services: Unexpected response code: 400 (Bad request: Invalid query parameter: "ns" - Namespaces are a Consul Enterprise feature)

Reproduction steps

Deploy the same Group multiple times with different consul { namespace = } values on a Nomad cluster with non-enterprise Consul.

Expected Result

Explicit failure - refusal to deploy the job, perhaps?

Actual Result

They all get silently dumped in the default namespace

Job file (if appropriate)

I wouldn't suggest trying this one yourself, way too many prerequisites, but I discovered this with https://github.com/grapl-security/grapl/pull/2026/files#diff-47f0314c3995de007f9d705f4ea0b1f681b482df1f5fa3618ac8a11613599a19

wimax-grapl avatar Oct 05 '22 17:10 wimax-grapl

Hi @wimax-grapl! I did some diving into the code and I think I see why it's doing this currently. The client agent is what's talking to Consul here, and so while the client knows whether it's talking to Consul Enterprise or not, the server doesn't. So it would be challenging to surface this information to the server to pick it up at job submit time. That being said, it should be possible to do this at allocation placement time, but the allocation would fail and then be rescheduled until it runs out of reschedules. That's not a great user experience either.

But I'm going to mark this as an enhancement for further discussion and roadmapping. Thanks for opening the issue!

tgross avatar Oct 05 '22 19:10 tgross

Yep, totally makes sense that it'd be hard to surface to users. If there were perhaps a way to surface Consul Enterprise as a Resource that a cluster needs - like disk space or mem or something - that could be a reasonable way to expose it to the customer.

wimax-grapl avatar Oct 05 '22 20:10 wimax-grapl

If there were perhaps a way to surface Consul Enterprise as a Resource that a cluster needs

I think this is already possible, if you were to create a constraint on the attribute ${attr.consul.sku}, e.g.

➜ nomad node status -self -verbose | grep consul\.sku
consul.sku                = oss

(and the Enterprise version would be ent)

So something like this (haven't tested)

constraint {
  attribute = "${attr.consul.sku}"
  value     = "ent"
}

shoenig avatar Oct 11 '22 14:10 shoenig