rules_docker icon indicating copy to clipboard operation
rules_docker copied to clipboard

Add the ability to add healthcheck definition for containers

Open svagner opened this issue 3 years ago • 18 comments

Original issue: https://github.com/bazelbuild/rules_docker/issues/770

Adds follow options for container_image:

  • healthcheck_test
  • healthcheck_interval
  • healthcheck_timeout
  • healthcheck_start_period
  • healthcheck_retries

svagner avatar Mar 01 '21 22:03 svagner

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

google-cla[bot] avatar Mar 01 '21 22:03 google-cla[bot]

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: svagner To complete the pull request process, please assign davegay after the PR has been reviewed. You can assign the PR to them by writing /assign @davegay in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Mar 01 '21 22:03 k8s-ci-robot

@googlebot I signed it!

svagner avatar Mar 01 '21 22:03 svagner

cc @xingao267 , @alex1545 as you were in the discussion in the original issue

svagner avatar Mar 01 '21 22:03 svagner

Hiding whitespace changes definitely makes this diff easier to review: https://github.com/bazelbuild/rules_docker/pull/1742/files?w=1

zoidyzoidzoid avatar Mar 11 '21 10:03 zoidyzoidzoid

I don't see a reason why this couldn't be merged. The only concern I have is that we're overloading healthcheck_test to do multiple types of tests (CMD, CMD-SHELL, NONE, etc). Ideally I'd gravitate towards separating these out but I'm not sure that would make sense in this case due to this being a docker-specific rule.

I'm leaning towards thinking this is fine to include but I'd like the opinion of another maintainer just as a sanity check.

gravypod avatar Mar 13 '21 19:03 gravypod

I think it's cool that you put this together, but I'm a little wary of this as-is for a couple of reasons:

  1. It's a bit telling that I've never heard of this feature before today. As far as I can tell, it's not supported in kubernetes but was a feature added on by Docker Inc for swarm specifically. I don't think this feature is going to get much mileage, and it comes at a time when we're trying to simplify this repo rather than add complexity.
  2. Given a Dockerfile only has the HEALTHCHECK command, why does rules_docker now need 5 new options to say the same thing? If there is no parsing library in google/go-containerregistry (upon which this repo depends) to parse that into the necessary config, then that library/function should probably there upstream before merging this. I don't think we should need more than a single new string or string_list attribute healthcheck for this.

pcj avatar Mar 14 '21 20:03 pcj

I don't see a reason why this couldn't be merged. The only concern I have is that we're overloading healthcheck_test to do multiple types of tests (CMD, CMD-SHELL, NONE, etc). Ideally I'd gravitate towards separating these out but I'm not sure that would make sense in this case due to this being a docker-specific rule.

I'm leaning towards thinking this is fine to include but I'd like the opinion of another maintainer just as a sanity check.

Yeah. I also thought that probably it's better to have healthcheck_cmd and healthcheck_cmd_shell options. But it will lead to more complications:(

svagner avatar Mar 18 '21 10:03 svagner

I think it's cool that you put this together, but I'm a little wary of this as-is for a couple of reasons:

  1. It's a bit telling that I've never heard of this feature before today. As far as I can tell, it's not supported in kubernetes but was a feature added on by Docker Inc for swarm specifically. I don't think this feature is going to get much mileage, and it comes at a time when we're trying to simplify this repo rather than add complexity.
  2. Given a Dockerfile only has the HEALTHCHECK command, why does rules_docker now need 5 new options to say the same thing? If there is no parsing library in google/go-containerregistry (upon which this repo depends) to parse that into the necessary config, then that library/function should probably there upstream before merging this. I don't think we should need more than a single new string or string_list attribute healthcheck for this.

Thanks for the feedback. Let me explain my thoughts about it.

  1. These rules are called rules_docker so I believe it means that we should support at least all the main features of docker. I agree that kube doesn't use docker's HEALTHCHECK since it has it's own livenessProbe but this plugin isn't dedicated to kube only is it?
  2. I think having healthcheck_test =" --interval = 5s --timeout = 10s --retries = 3 CMD curl -sS 127.0.0.1 "" will be a little bit confusing for users of these rules since interval / to / retries are optional parameters for health checking right? And unfortunately I don't follow your point about the parsing of this parameter in the google / go-containerregistry`. Since it is a single directive there is not necessary to parse it. But I'm again worrying about moving complications of the definition for health checks from the rules code to users.

Let me know if my arguments make sense to you.

svagner avatar Mar 18 '21 10:03 svagner

I don't see how --interval = 5s --timeout = 10s --retries = 3 CMD curl -sS 127.0.0.1 could possibly be confusing to any user wanting to use this, because that's the syntax Docker came up with and has clearly documented.

And it is definitely necessary to parse it, because the configuration for it is not a simple string. It is https://github.com/moby/moby/blob/46cdcd206c56172b95ba5c77b827a722dab426c5/api/types/container/config.go#L16-L35:

// HealthConfig holds configuration settings for the HEALTHCHECK feature.
type HealthConfig struct {
	// Test is the test to perform to check that the container is healthy.
	// An empty slice means to inherit the default.
	// The options are:
	// {} : inherit healthcheck
	// {"NONE"} : disable healthcheck
	// {"CMD", args...} : exec arguments directly
	// {"CMD-SHELL", command} : run command with system's default shell
	Test []string `json:",omitempty"`

	// Zero means to inherit. Durations are expressed as integer nanoseconds.
	Interval    time.Duration `json:",omitempty"` // Interval is the time to wait between checks.
	Timeout     time.Duration `json:",omitempty"` // Timeout is the time to wait before considering the check to have hung.
	StartPeriod time.Duration `json:",omitempty"` // The start period for the container to initialize before the retries starts to count down.

	// Retries is the number of consecutive failures needed to consider a container as unhealthy.
	// Zero means inherit.
	Retries int `json:",omitempty"`
}

Part of the parser is here: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/parser/line_parsers.go#L343-L369

But rules_docker has no dependency on moby library, it has a dependency on go-containerregistry, which has the same copy of the config: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/parser/line_parsers.go#L343-L369.

Yes, as far as I know, there is no parsing logic in go-containerregistry to transform a string into that config. The parsing logic should be upstreamed there, in the same place that has the config. Once that logic is there, we could call parseHealthcheck here and get the correct config.

Also, rules_docker should not strive to replicate every feature in docker or a Dockerfile. A noteable absence is the RUN feature.

pcj avatar Mar 19 '21 01:03 pcj

Ah, you meant about an unmarshaling line to the struct. Yeah. Makes sense. I'm not sure go-containerregistry will approve this change though since it looks like they don't need such a directive there. I can try to prepare PR for go-containerregistry for it but are we ready to add a health check directive here? I really just don't want to waste reviewers of go-containerregistry time and my time if we don't want this feature at all.

Also, rules_docker should not strive to replicate every feature in docker or a Dockerfile. A noteable absence is the RUN feature.

svagner avatar Mar 19 '21 10:03 svagner

If we can get a parsing function upstreamed and simplify this PR to add only a single healthcheck attribute, I would support it.

pcj avatar Mar 29 '21 19:03 pcj

@svagner will you continue this PR?

medzin avatar Aug 30 '21 10:08 medzin

Hey @jonjohnsonjr what do you think about the proposed upstream change to go-containerregistry?

alexeagle avatar Aug 30 '21 13:08 alexeagle

what do you think about the proposed upstream change to go-containerregistry?

That sounds reasonable to me, something like func ParseHealthConfig(s string) (*HealthConfig, error)?

I'm happy to review a PR.

jonjohnsonjr avatar Aug 30 '21 15:08 jonjohnsonjr

Any progress on this? Would be great to get this added!! If you want to use ecs with out elb/alb for internal services with service discovery the way they support health checks is through docker health check. Without this it becomes an issue.

dgocoder avatar Jan 26 '22 18:01 dgocoder

@dgocoder I'm not sure if @svagner is still around to finish this work (by upstreaming the Go parts) or if it needs a new contributor to step up to finish.

alexeagle avatar Jan 27 '22 13:01 alexeagle

hi, i'm going to take a crack at picking this up and upstreaming said changes. posting to shame myself into followthrough.

pjjw avatar Jul 19 '22 20:07 pjjw

This Pull Request has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!

github-actions[bot] avatar Jan 17 '23 02:01 github-actions[bot]

hey @pjjw, any luck? We would really love to see this merged. currently we need to have a base image created in a different build to include the healthcheck, quite far from the application definition... :/

nadavwe avatar Jan 26 '23 07:01 nadavwe

no, have not plumbed this upstream yet, but this diff works for us as-is.

pjjw avatar Jan 26 '23 15:01 pjjw

This Pull Request has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!

github-actions[bot] avatar Jul 27 '23 02:07 github-actions[bot]

This PR was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

github-actions[bot] avatar Aug 27 '23 02:08 github-actions[bot]