consul icon indicating copy to clipboard operation
consul copied to clipboard

Services Tab "unhealthy" marking

Open vijayraghav-io opened this issue 2 years ago • 15 comments
trafficstars

Closes #16550

Hi @jkirschner-hashicorp / @huikang

PFB proposed design and implementation

  1. "MarkServiceStatusThreshold" - New configuration parameter is introduced inside UIConfig { }
  • Default value is set as 0.6 (i.e. 60%). The most acceptable default value can be arrived at and we can update.
  • Based on this parameter's value the service is marked red (critical) or green(pass) or warning in Services Tab. Consider the scenario - 6 out of 10 instances, (>= 60%) have all checks passing in 6 instances, then service is marked Green rather than red. The other 4 instances may have critical checks or warning checks.
  • Need to update this to documentation (To Do).
  1. Three new fields are introduced in "ServiceSummary" struct. These are "InstancesPassing", "InstancesWarning", "InstancesCritical". The purposed of these fields and values it holds are explained below:
  • If a node instance has all health checks passing, then it adds to "InstancesPassing" count.
  • If a node instance has atleast 1 critical health check, then it adds to "InstancesCritical" count.
  • If a node instance has 0 critical health check , but atleast 1 warning health check, then it adds to "InstancesWarning" count.
  1. Based on the values of these new fields and new config param (“MarkServiceStatusThreshold”)returned in api - "/v1/internal/ui/services" response, decision is made in UI, whether to mark service as green or not.

Complete Tests on UI and most relevant APP(backend) tests are run and verified. PFB snapshots of same. Issue_16550_ServicesTab_1

FYI @reskin89

vijayraghav-io avatar Aug 14 '23 14:08 vijayraghav-io

@vijayraghav-io Thanks for the PR. Are there any UI changes if any to the services view? Could you provide perhaps an example of this in action?

david-yu avatar Aug 16 '23 06:08 david-yu

sure @david-yu , below are snapshots of services view with changes and for different scenarios.

  1. Number of healthy instances > threshold (default 60%) of total instances Issue_16550_ServicesTab_4

  2. Number of healthy instances < threshold (default 60%) of total instances Issue_16550_ServicesTab_3

  3. Zero healthy instances Issue_16550_ServicesTab_5

  4. Warning instances without critical Issue_16550_ServicesTab_6

  5. Tool tips added for new info Issue_16550_ServicesTab_7

vijayraghav-io avatar Aug 16 '23 15:08 vijayraghav-io

Hi @david-yu / @huikang / @jkirschner-hashicorp ,

Reminding to review this PR and share if any feedback. Thank you!

vijayraghav-io avatar Sep 04 '23 14:09 vijayraghav-io

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

github-actions[bot] avatar Nov 04 '23 01:11 github-actions[bot]

Leaving a comment to prevent this PR from auto-closing

vijayraghav-io avatar Dec 09 '23 12:12 vijayraghav-io

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

github-actions[bot] avatar Feb 18 '24 01:02 github-actions[bot]

Bump this please, it's an important QoL

reskin89 avatar Feb 18 '24 01:02 reskin89

Hi @david-yu , please help to review.

vijayraghav-io avatar Feb 18 '24 08:02 vijayraghav-io

Hi @david-yu , @reskin89 ,

Added these videos just for sake of better understanding or to show feature implemented in action.

Consider the scenario - Having a service "web" running on 3 instances, the health check on one of the instance is only failing, rest of 2 instances have all heath checks passing.

  1. This video is as-is (without new feature implemented)

https://www.loom.com/share/446d5c19e61c4b1a881d7117333e00e6?sid=dd32a8fc-2128-4e72-9173-58b61fd703df

  1. This video is with new feature implemented for same scenario https://www.loom.com/share/c36a8b66b6a0447f8ecee07cc3530a98?sid=26517e61-557e-41fb-8aed-b7d1e62b1308

The overall status for web is green when threshold is set to 0.6 in config. overall status is marked critical when threshold is set 0.7 (which means atleast 70% of instances must have all health checks passing for the overall status to be marked green)

vijayraghav-io avatar Feb 19 '24 13:02 vijayraghav-io

@jkirschner-hashicorp does this look like something we'd be able to merge soon? This is a really awesome QoL for the UI

reskin89 avatar Feb 19 '24 13:02 reskin89

Hi @vijayraghav-io and @reskin89: I'm acknowledging that I saw your messages and the team is internally discussing this (the proposed solution and the underlying problem it intends to address). I'll follow up after that internal discussion settles. In the meantime, thank you for your contributions and this conversation!

jkirschner-hashicorp avatar Feb 20 '24 18:02 jkirschner-hashicorp

Hi @jkirschner-hashicorp, Thankyou! for the response and considering it further. Looking forward for the follow-up or discussions/queries if any. Thankyou!

vijayraghav-io avatar Feb 21 '24 05:02 vijayraghav-io

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

github-actions[bot] avatar May 28 '24 01:05 github-actions[bot]

Reminding for review

vijayraghav-io avatar May 28 '24 07:05 vijayraghav-io

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

github-actions[bot] avatar Jul 30 '24 01:07 github-actions[bot]