consul
consul copied to clipboard
Services Tab "unhealthy" marking
Closes #16550
Hi @jkirschner-hashicorp / @huikang
PFB proposed design and implementation
- "MarkServiceStatusThreshold" - New configuration parameter is introduced inside UIConfig { }
- Default value is set as 0.6 (i.e. 60%). The most acceptable default value can be arrived at and we can update.
- Based on this parameter's value the service is marked red (critical) or green(pass) or warning in Services Tab. Consider the scenario - 6 out of 10 instances, (>= 60%) have all checks passing in 6 instances, then service is marked Green rather than red. The other 4 instances may have critical checks or warning checks.
- Need to update this to documentation (To Do).
- Three new fields are introduced in "ServiceSummary" struct. These are "InstancesPassing", "InstancesWarning", "InstancesCritical". The purposed of these fields and values it holds are explained below:
- If a node instance has all health checks passing, then it adds to "InstancesPassing" count.
- If a node instance has atleast 1 critical health check, then it adds to "InstancesCritical" count.
- If a node instance has 0 critical health check , but atleast 1 warning health check, then it adds to "InstancesWarning" count.
- Based on the values of these new fields and new config param (“MarkServiceStatusThreshold”)returned in api - "/v1/internal/ui/services" response, decision is made in UI, whether to mark service as green or not.
Complete Tests on UI and most relevant APP(backend) tests are run and verified. PFB snapshots of same.
FYI @reskin89
@vijayraghav-io Thanks for the PR. Are there any UI changes if any to the services view? Could you provide perhaps an example of this in action?
sure @david-yu , below are snapshots of services view with changes and for different scenarios.
-
Number of healthy instances > threshold (default 60%) of total instances
-
Number of healthy instances < threshold (default 60%) of total instances
-
Zero healthy instances
-
Warning instances without critical
-
Tool tips added for new info
Hi @david-yu / @huikang / @jkirschner-hashicorp ,
Reminding to review this PR and share if any feedback. Thank you!
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.
Leaving a comment to prevent this PR from auto-closing
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.
Bump this please, it's an important QoL
Hi @david-yu , please help to review.
Hi @david-yu , @reskin89 ,
Added these videos just for sake of better understanding or to show feature implemented in action.
Consider the scenario - Having a service "web" running on 3 instances, the health check on one of the instance is only failing, rest of 2 instances have all heath checks passing.
- This video is as-is (without new feature implemented)
https://www.loom.com/share/446d5c19e61c4b1a881d7117333e00e6?sid=dd32a8fc-2128-4e72-9173-58b61fd703df
- This video is with new feature implemented for same scenario https://www.loom.com/share/c36a8b66b6a0447f8ecee07cc3530a98?sid=26517e61-557e-41fb-8aed-b7d1e62b1308
The overall status for web is green when threshold is set to 0.6 in config. overall status is marked critical when threshold is set 0.7 (which means atleast 70% of instances must have all health checks passing for the overall status to be marked green)
@jkirschner-hashicorp does this look like something we'd be able to merge soon? This is a really awesome QoL for the UI
Hi @vijayraghav-io and @reskin89: I'm acknowledging that I saw your messages and the team is internally discussing this (the proposed solution and the underlying problem it intends to address). I'll follow up after that internal discussion settles. In the meantime, thank you for your contributions and this conversation!
Hi @jkirschner-hashicorp, Thankyou! for the response and considering it further. Looking forward for the follow-up or discussions/queries if any. Thankyou!
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.
Reminding for review
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.