nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Support for adding description when marking node as ineligible.

Open shishir-a412ed opened this issue 2 years ago • 5 comments

A few team members in our orchestration team (@shivdudhani and @jamesalbert) asked for this, and I felt it would be useful for everyone. During outages and other regular maintenance, operators do the following:

  • Mark a collection of nodes as ineligible e.g. They want to drain the nodes later or some specific criteria for a particular outage.
  • Come back after a couple of hours and check all the nodes which are ineligible - nomad node status | grep ineligible
  • Now they want to mark the nodes back to eligible but not sure why they were marked ineligible in the first place! (It could be nodes marked by another operator as ineligible)

Tracking this information and showing them as part of nomad node status would be useful to the operators.

eligibility

TODO:

  • Add unit tests.
  • Follow up PR for adding description as a flag to Nomad HTTP API.

shishir-a412ed avatar Dec 14 '22 01:12 shishir-a412ed

@shishir-a412ed is attempting to deploy a commit to the HashiCorp Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar Dec 14 '22 01:12 vercel[bot]

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
nomad-storybook-and-ui 🔄 Building (Inspect) Jan 5, 2023 at 1:43PM (UTC)

vercel[bot] avatar Jan 05 '23 13:01 vercel[bot]

I really like what this PR is capturing. Some nitpicks: I think calling IneligibleReason rather than Description more tightly aligns with your intent, since your implementation requires it be "" in the eligible case. Then the flag could be -reason to match that.

Description seems a bit too general to describe the case, since it could reasonably apply to any node regardless of status, and doesn't have the smell of exceptional, IMO. There could also be future value in being able to describe a node in a way other than why it's ineligible; so saving Description for some potential future could prove useful.

I thought another possibility for output might be to present it in the eligibility column parenthetically when set. This would stop from adding a new table header to the tabular output and still let us enhance the output with this new information without changing the experience for folks that don't have quite the same need for it. Just a thought?

ID        DC   Name                        Class   Drain  Eligibility                                      Status
046fbcf4  dc1  nomad-client-3.node.consul  <none>  false  eligible                                         ready
d18649d1  dc1  nomad-client-1.node.consul  <none>  false  eligible                                         ready
5bb73e6b  dc1  nomad-client-2.node.consul  <none>  false  ineligible (due to multiple plan rejections)     ready

Other Nomad engineers might have different opinions, so I'm putting this feedback in as a comment rather than an approval.

@angrycub I like the idea. Description does feel a little too generic! How about renaming the column to Ineligibility Reason (and the flag -ineligibility-reason to match that)?

Re: A separate column - I feel we should not piggyback on the existing Eligibility column. Eligibility defines the state (eligible vs ineligible). Reason should be displayed separately imho. This also helps when filtering out information using tools like awk.

Let me know if this sounds okay? and I will update the PR.

shishir-a412ed avatar Feb 02 '23 19:02 shishir-a412ed

@angrycub Any updates on this one?

shishir-a412ed avatar Mar 01 '23 18:03 shishir-a412ed

Hi @shishir-a412ed 👋

After some internal discussions we feel like adding an extra column to nomad node status that may not be frequently used may be too much for an overall status command.

Since you added this field to the NodeListStub struct it will be available to users through the -t and -json outputs which are better fit for parsing CLI output in scripts. More over, this field is likely to contain arbitrary long text, which may have an impact on legibility as it pushes columns further to the right.

Not displaying this information does create an issue of discoverability: how would a person know that there's more information about the node's ineligible status? One suggestion was to add an asterisk to indicate that more information can be fetched, so something like this:

$ nomad node status
ID        DC   Name               Class   Drain  Eligibility  Status
d42450e0  dc1  laoqui-GDP2M7PY2N  <none>  false  ineligible*   ready

We can then document what the asterisk means, or maybe even output that as footnote in the command output. What do you think about this approach?

lgfa29 avatar Mar 07 '23 01:03 lgfa29