consul icon indicating copy to clipboard operation
consul copied to clipboard

Feature: Health checks windows service

Open deblasis opened this issue 3 years ago • 14 comments

Description

To provide windows service health check functionality addressing the issue #10637

Testing & Reproduction steps

The PR is Draft because I would like you guys to point me to your preferred way of mocking syscalls and I will update the feature branch with the relevant tests.

Links

  • Issue: #10637
  • PRD (kinda 🙂): https://gist.github.com/deblasis/b6dd645a8bd5e2b3c9b95bb30bbb238c

PR Checklist

  • [x] updated test coverage
  • [x] external facing docs updated
  • [x] not a security concern

deblasis avatar Jun 07 '22 17:06 deblasis

Hey @deblasis

Super excited to see this PR come through, thank you! After looking ( and asking ) around it doesn't look like we have any established pattern for mocking syscalls. You're free to add the tests in whenever you get a chance 👍 😄

Amier3 avatar Jun 08 '22 20:06 Amier3

Hey @deblasis

Super excited to see this PR come through, thank you! After looking ( and asking ) around it doesn't look like we have any established pattern for mocking syscalls. You're free to add the tests in whenever you get a chance 👍 😄

Hi @Amier3 !

I have added syscall mocking and testing, I guess this is now finally ready for review! 😁

GOOS=windows go test -tags=windows ./agent/checks -run TestCheck_OSService -v

image

deblasis avatar Jun 09 '22 14:06 deblasis

@deblasis : for context, our education/docs team has performed a review (above). The engineering review still needs to be conducted! I'd recommend waiting for an engineering review before making changes so you can address all the feedback in one go.

jkirschner-hashicorp avatar Jun 15 '22 16:06 jkirschner-hashicorp

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

github-actions[bot] avatar Aug 15 '22 01:08 github-actions[bot]

Hi guys, sorry for the delay. I have updated the PR taking into account your feedback and I made a small improvement in the reported service state that's now in string format.

Please also check if the updated website/content/docs/discovery/checks.mdx from main is OK since you guys updated the format and I had to make a couple of adjustments.

deblasis avatar Aug 28 '22 17:08 deblasis

@deblasis would you be able to rebase or merge from main? It should have fixes for the failing integration tests

kisunji avatar Aug 30 '22 17:08 kisunji

Thanks! Tests are all passing. Will make some time this week to test on Windows

kisunji avatar Sep 06 '22 13:09 kisunji

Apologies for the delay; I didn't have time this week but will make sure to test next week. I will also post a comment on the original github issue to see if there is anyone who would like to test out the feature.

kisunji avatar Sep 16 '22 19:09 kisunji

I discovered a few issues while testing:

  1. We need to initialize checkOSServices otherwise there is a panic due to assigning to a nil map
  2. (not a blocker) UI does not recognize OS as a type of check so it appears as a "Serf Check" in the UI.
  3. I am seeing the following in the logs:
2022-09-20T17:29:34.057-0400 [DEBUG] agent: Check failed: check=service-check error="error accessing service: The handle is invalid."

kisunji avatar Sep 20 '22 21:09 kisunji

Reproduction steps:

// config/nodecheck.hcl
check = {
  name = "node-check"
  os_service = "AdobeARMService"
  interval = "10s"
}

// config/svccheck.hcl
check = {
  name = "service-check"
  os_service = "AdobeARMService"
  service_id = "hello"
  interval = "10s"
}
# powershell as admin
.\consul.exe agent -dev -config-dir=".\config"

Confirming that AdodeARMService is running

PS C:\Users\Chris> Get-Service -Name "AdobeARMService"

Status   Name               DisplayName
------   ----               -----------
Running  AdobeARMservice    Adobe Acrobat Update Service

kisunji avatar Sep 20 '22 21:09 kisunji

Hi @kisunji! Sorry about that. But I have good news 👇

  • We need to initialize checkOSServices otherwise there is a panic due to assigning to a nil map

Addressed in https://github.com/hashicorp/consul/pull/13388/commits/fc0dd92dcf54e6a06abce5c24b47464263fca94f

  • (not a blocker) UI does not recognize OS as a type of check so it appears as a "Serf Check" in the UI.

I have noticed that I missed some properties and the corresponding generated mappings (fixed in https://github.com/hashicorp/consul/pull/13388/commits/461b42ed480cd39e8a41eb9207470958420ff2ce) but however I believe there's a bug elsewhere, I have added another check to... check 🙂 (repro config below) and I am seeing serf in there as well: image

Since I was looking at the UI, I noticed the lack of message so I have added a "Healthy" message with https://github.com/hashicorp/consul/pull/13388/commits/5719fd65607b8af263ed06455cd6affe75c0d0c4

  • I am seeing the following in the logs:
2022-09-20T17:29:34.057-0400 [DEBUG] agent: Check failed: check=service-check error="error accessing service: The handle is invalid."

Addressed in https://github.com/hashicorp/consul/pull/13388/commits/f440966a386ad8a9140ac44d72bdb1f4fa0eaf7d

I have tested the above with:

// config/nodecheck.hcl
check = {
  name = "node-check"
  os_service = "Dhcp"
  interval = "10s"
}

// config/h2ping.hcl
check = {
  id = "h2ping-check"
  name = "h2ping"
  h2ping = "localhost:22222"
  interval = "10s"
  h2ping_use_tls = false
}

deblasis avatar Sep 21 '22 12:09 deblasis

Ah it must be a UI bug then. Thanks for the quick updates, will try to test again soon

kisunji avatar Sep 21 '22 14:09 kisunji

@johncowen: any ideas on why newer health check types (os_service from this PR, h2ping introduced a few months ago) would show up as a "Serf" health check on the UI?

jkirschner-hashicorp avatar Sep 21 '22 17:09 jkirschner-hashicorp

If the Type field in the HTTP response that the UI uses is empty then we assume that is a serf check. Possibly or possibly not useful UI PR https://github.com/hashicorp/consul/pull/10194

I would guess that the new healthchecks are coming out with empty Types in the HTTP responses.

Umm separate thing but if we have new healthcheck types then the UI code might need to be aware of those for menus etc. not totally sure if we inspect those or if they are hardcoded.

At the very least UI should add these possible values to our mock API I think

johncowen avatar Sep 21 '22 17:09 johncowen

Hi @johncowen ! Nice seeing you again around here (you reviewed a PR of mine back in the day) :)

If the Type field in the HTTP response that the UI uses is empty then we assume that is a serf check. Possibly or possibly not useful UI PR #10194

I would guess that the new healthchecks are coming out with empty Types in the HTTP responses.

I can confirm that this is the case, the Type is empty in the HTTP responses even for "older" checks like h2ping.

deblasis avatar Sep 21 '22 21:09 deblasis

@trujillo-adam:

Had a few suggestions and questions w/r/t the docs. You did a really good job on the main description of the OS service check, BTW.

Suggestions accepted! Thanks but @jkirschner-hashicorp and yourself helped a lot already giving it the first pass a few weeks ago. My initial version was not that clear. Now it looks pretty good. 👍

deblasis avatar Sep 23 '22 06:09 deblasis

👍

On Mon, 29 Aug 2022 at 18:05, Chris S. Kim @.***> wrote:

@.**** commented on this pull request.

In agent/agent.go https://github.com/hashicorp/consul/pull/13388#discussion_r957584809:

  •   	if prev := a.checkOSServices[cid]; prev != nil {
    
  • 		prev.Stop()
    
  • 	}
    

I wouldn't mind if you just removed the line from the docker case as well

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul/pull/13388#discussion_r957584809, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHAEQNW57SXDCXXOIMI2FPLV3TUVJANCNFSM5YDYBKAQ . You are receiving this because you were mentioned.Message ID: @.***>

deblasis avatar Oct 11 '22 09:10 deblasis

I think it should be OK now. Shall we🚢 it ? 😄

deblasis avatar Oct 14 '22 11:10 deblasis

@deblasis I've tested this successfully on Windows so this is good to 🚢 !

kisunji avatar Oct 17 '22 13:10 kisunji