nomad
nomad copied to clipboard
Support for adding description when marking node as ineligible.
A few team members in our orchestration team (@shivdudhani and @jamesalbert) asked for this, and I felt it would be useful for everyone. During outages and other regular maintenance, operators do the following:
- Mark a collection of nodes as ineligible e.g. They want to drain the nodes later or some specific criteria for a particular outage.
- Come back after a couple of hours and check all the nodes which are ineligible -
nomad node status | grep ineligible
- Now they want to mark the nodes back to
eligible
but not sure why they were markedineligible
in the first place! (It could be nodes marked by another operator asineligible
)
Tracking this information and showing them as part of nomad node status
would be useful to the operators.
data:image/s3,"s3://crabby-images/ce3f4/ce3f4050e4055f4d2d3b058467111d10e3a1b011" alt="eligibility"
TODO:
- Add unit tests.
- Follow up PR for adding
description
as a flag to Nomad HTTP API.
@shishir-a412ed is attempting to deploy a commit to the HashiCorp Team on Vercel.
A member of the Team first needs to authorize it.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Updated |
---|---|---|---|
nomad-storybook-and-ui | 🔄 Building (Inspect) | Jan 5, 2023 at 1:43PM (UTC) |
I really like what this PR is capturing. Some nitpicks: I think calling
IneligibleReason
rather thanDescription
more tightly aligns with your intent, since your implementation requires it be "" in theeligible
case. Then the flag could be-reason
to match that.
Description
seems a bit too general to describe the case, since it could reasonably apply to any node regardless of status, and doesn't have the smell of exceptional, IMO. There could also be future value in being able to describe a node in a way other than why it's ineligible; so savingDescription
for some potential future could prove useful.I thought another possibility for output might be to present it in the eligibility column parenthetically when set. This would stop from adding a new table header to the tabular output and still let us enhance the output with this new information without changing the experience for folks that don't have quite the same need for it. Just a thought?
ID DC Name Class Drain Eligibility Status 046fbcf4 dc1 nomad-client-3.node.consul <none> false eligible ready d18649d1 dc1 nomad-client-1.node.consul <none> false eligible ready 5bb73e6b dc1 nomad-client-2.node.consul <none> false ineligible (due to multiple plan rejections) ready
Other Nomad engineers might have different opinions, so I'm putting this feedback in as a comment rather than an approval.
@angrycub I like the idea. Description
does feel a little too generic! How about renaming the column to Ineligibility Reason
(and the flag -ineligibility-reason
to match that)?
Re: A separate column - I feel we should not piggyback on the existing Eligibility
column. Eligibility
defines the state (eligible
vs ineligible
). Reason should be displayed separately imho. This also helps when filtering out information using tools like awk
.
Let me know if this sounds okay? and I will update the PR.
@angrycub Any updates on this one?
Hi @shishir-a412ed 👋
After some internal discussions we feel like adding an extra column to nomad node status
that may not be frequently used may be too much for an overall status
command.
Since you added this field to the NodeListStub
struct it will be available to users through the -t
and -json
outputs which are better fit for parsing CLI output in scripts. More over, this field is likely to contain arbitrary long text, which may have an impact on legibility as it pushes columns further to the right.
Not displaying this information does create an issue of discoverability: how would a person know that there's more information about the node's ineligible status? One suggestion was to add an asterisk to indicate that more information can be fetched, so something like this:
$ nomad node status
ID DC Name Class Drain Eligibility Status
d42450e0 dc1 laoqui-GDP2M7PY2N <none> false ineligible* ready
We can then document what the asterisk means, or maybe even output that as footnote in the command output. What do you think about this approach?