consul
consul copied to clipboard
Feature: Health checks windows service
Description
To provide windows service health check functionality addressing the issue #10637
Testing & Reproduction steps
The PR is Draft because I would like you guys to point me to your preferred way of mocking syscalls and I will update the feature branch with the relevant tests.
Links
- Issue: #10637
- PRD (kinda 🙂): https://gist.github.com/deblasis/b6dd645a8bd5e2b3c9b95bb30bbb238c
PR Checklist
- [x] updated test coverage
- [x] external facing docs updated
- [x] not a security concern
Hey @deblasis
Super excited to see this PR come through, thank you! After looking ( and asking ) around it doesn't look like we have any established pattern for mocking syscalls. You're free to add the tests in whenever you get a chance 👍 😄
Hey @deblasis
Super excited to see this PR come through, thank you! After looking ( and asking ) around it doesn't look like we have any established pattern for mocking syscalls. You're free to add the tests in whenever you get a chance 👍 😄
Hi @Amier3 !
I have added syscall mocking and testing, I guess this is now finally ready for review! 😁
GOOS=windows go test -tags=windows ./agent/checks -run TestCheck_OSService -v

@deblasis : for context, our education/docs team has performed a review (above). The engineering review still needs to be conducted! I'd recommend waiting for an engineering review before making changes so you can address all the feedback in one go.
This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.
Hi guys, sorry for the delay. I have updated the PR taking into account your feedback and I made a small improvement in the reported service state that's now in string format.
Please also check if the updated website/content/docs/discovery/checks.mdx from main is OK since you guys updated the format and I had to make a couple of adjustments.
@deblasis would you be able to rebase or merge from main? It should have fixes for the failing integration tests
Thanks! Tests are all passing. Will make some time this week to test on Windows
Apologies for the delay; I didn't have time this week but will make sure to test next week. I will also post a comment on the original github issue to see if there is anyone who would like to test out the feature.
I discovered a few issues while testing:
- We need to initialize
checkOSServicesotherwise there is a panic due to assigning to a nil map - (not a blocker) UI does not recognize OS as a type of check so it appears as a "Serf Check" in the UI.
- I am seeing the following in the logs:
2022-09-20T17:29:34.057-0400 [DEBUG] agent: Check failed: check=service-check error="error accessing service: The handle is invalid."
Reproduction steps:
// config/nodecheck.hcl
check = {
name = "node-check"
os_service = "AdobeARMService"
interval = "10s"
}
// config/svccheck.hcl
check = {
name = "service-check"
os_service = "AdobeARMService"
service_id = "hello"
interval = "10s"
}
# powershell as admin
.\consul.exe agent -dev -config-dir=".\config"
Confirming that AdodeARMService is running
PS C:\Users\Chris> Get-Service -Name "AdobeARMService"
Status Name DisplayName
------ ---- -----------
Running AdobeARMservice Adobe Acrobat Update Service
Hi @kisunji! Sorry about that. But I have good news 👇
- We need to initialize
checkOSServicesotherwise there is a panic due to assigning to a nil map
Addressed in https://github.com/hashicorp/consul/pull/13388/commits/fc0dd92dcf54e6a06abce5c24b47464263fca94f
- (not a blocker) UI does not recognize OS as a type of check so it appears as a "Serf Check" in the UI.
I have noticed that I missed some properties and the corresponding generated mappings (fixed in https://github.com/hashicorp/consul/pull/13388/commits/461b42ed480cd39e8a41eb9207470958420ff2ce) but however I believe there's a bug elsewhere, I have added another check to... check 🙂 (repro config below) and I am seeing serf in there as well:

Since I was looking at the UI, I noticed the lack of message so I have added a "Healthy" message with https://github.com/hashicorp/consul/pull/13388/commits/5719fd65607b8af263ed06455cd6affe75c0d0c4
- I am seeing the following in the logs:
2022-09-20T17:29:34.057-0400 [DEBUG] agent: Check failed: check=service-check error="error accessing service: The handle is invalid."
Addressed in https://github.com/hashicorp/consul/pull/13388/commits/f440966a386ad8a9140ac44d72bdb1f4fa0eaf7d
I have tested the above with:
// config/nodecheck.hcl
check = {
name = "node-check"
os_service = "Dhcp"
interval = "10s"
}
// config/h2ping.hcl
check = {
id = "h2ping-check"
name = "h2ping"
h2ping = "localhost:22222"
interval = "10s"
h2ping_use_tls = false
}
Ah it must be a UI bug then. Thanks for the quick updates, will try to test again soon
@johncowen: any ideas on why newer health check types (os_service from this PR, h2ping introduced a few months ago) would show up as a "Serf" health check on the UI?
If the Type field in the HTTP response that the UI uses is empty then we assume that is a serf check. Possibly or possibly not useful UI PR https://github.com/hashicorp/consul/pull/10194
I would guess that the new healthchecks are coming out with empty Types in the HTTP responses.
Umm separate thing but if we have new healthcheck types then the UI code might need to be aware of those for menus etc. not totally sure if we inspect those or if they are hardcoded.
At the very least UI should add these possible values to our mock API I think
Hi @johncowen ! Nice seeing you again around here (you reviewed a PR of mine back in the day) :)
If the
Typefield in the HTTP response that the UI uses is empty then we assume that is a serf check. Possibly or possibly not useful UI PR #10194I would guess that the new healthchecks are coming out with empty
Types in the HTTP responses.
I can confirm that this is the case, the Type is empty in the HTTP responses even for "older" checks like h2ping.
@trujillo-adam:
Had a few suggestions and questions w/r/t the docs. You did a really good job on the main description of the OS service check, BTW.
Suggestions accepted! Thanks but @jkirschner-hashicorp and yourself helped a lot already giving it the first pass a few weeks ago. My initial version was not that clear. Now it looks pretty good. 👍
👍
On Mon, 29 Aug 2022 at 18:05, Chris S. Kim @.***> wrote:
@.**** commented on this pull request.
In agent/agent.go https://github.com/hashicorp/consul/pull/13388#discussion_r957584809:
if prev := a.checkOSServices[cid]; prev != nil {
prev.Stop()}I wouldn't mind if you just removed the line from the docker case as well
— Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul/pull/13388#discussion_r957584809, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAEQNW57SXDCXXOIMI2FPLV3TUVJANCNFSM5YDYBKAQ . You are receiving this because you were mentioned.Message ID: @.***>
I think it should be OK now. Shall we🚢 it ? 😄
@deblasis I've tested this successfully on Windows so this is good to 🚢 !