certificates
certificates copied to clipboard
Health check timeout (container state: unhealthy)
Subject of the issue
The step-ca container health state is shown as Up (health: starting), later it will turn to Up (unhealthy).
But the service runs fine and it also logs that it is listening now, so apparently the health check always fails.
Your environment
- OS: WSL 2
- Version: 2 (Windows 10 x64)
Steps to reproduce
docker-compose.yml:
version: '3.7'
services:
# Smallstep Step CA
step-ca:
image: smallstep/step-ca:0.15.6
restart: always
> docker-compose up -d
> docker-compose ps
Up (health: starting)
# after some time (about one minute)
> docker-compose ps
Up (unhealthy)
Expected behaviour
As the CA service runs correctly, the health check should pass and the container state should become Up (healthy) or similar, but not Up (unhealthy). Also the health check needs too long ((health: starting)) for one minute.
Actual behaviour
Health check needs too long (Up (health: starting)) and then fails after about a minute (Up (unhealthy)).
@strarsis Thanks for the report.
I have a couple questions.
-
Are you able to reach the CA's health check endpoint and get a
{"status":"ok"}response? The endpoint ishttps://ca_host:port/health -
I'm not able to reproduce your example
docker-compose.yml. The container comes up but the CA doesn't run. How did you initialize your PKI? Do you use a volume mount to store the configuration?
When I try to reproduce this with our Docker tutorial, the container health check works.
@tashian: One factor could be that WSL 2 with Docker for Desktop is used.
@strarsis Please provide more details about the environment and steps to reproduce, so we can test this. Thanks
I am using WSL 2 with Docker for Desktop and getting the same issue. Can you please provide info on the factor you mentioned above around using this?
Hi folks, I am having an issue with this container's healthcheck on docker swarm as well. I can reproduce very easily and I think I know what the problem may be (for me, at least).
docker run -it -e STEPDEBUG=1 smallstep/step-ca:0.16.0 sh
~ $ step ca init
β (e.g. Smallstep): Deiselβ
What DNS names or IP addresses would you like to add to your new CA?
β (e.g. ca.smallstep.com[,1.1.1.1,etc.]): ca.diesel.net
What IP and port will your new CA bind to?
β (e.g. :443 or 127.0.0.1:4343): :443
What would you like to name the CA's first provisioner?
β (e.g. [email protected]): test
Choose a password for your CA keys and first provisioner.
β [leave empty and we'll generate one]: β
β Password: gj%Dyy-[BuKp#.EP(%vl,!#{`fF4$cH,
Generating root certificate...
all done!
Generating intermediate certificate...
all done!
β Root certificate: /home/step/certs/root_ca.crt
β Root private key: /home/step/secrets/root_ca_key
β Root fingerprint: 2675480ce53fa83431099ddafe152f532ad0a197a6784a1a4641be32969f2578
β Intermediate certificate: /home/step/certs/intermediate_ca.crt
β Intermediate private key: /home/step/secrets/intermediate_ca_key
β Database folder: /home/step/db
β Default configuration: /home/step/config/defaults.json
β Certificate Authority configuration: /home/step/config/ca.json
Your PKI is ready to go. To generate certificates for individual services see 'step help ca'.
FEEDBACK π π»
The step utility is not instrumented for usage statistics. It does not
phone home. But your feedback is extremely valuable. Any information you
can provide regarding how youβre using `step` helps. Please send us a
sentence or two, good or bad: [email protected] or join
https://github.com/smallstep/certificates/discussions.
~ $ step ca health
Get "https://ca.diesel.net/health": x509: certificate is valid for 9721f7d721878f7496b87c17dcab760d.2868b98699e09c78a80c69bee273ddd8.traefik.default, not ca.diesel.net
client.Health; client GET https://ca.diesel.net/health failed
github.com/smallstep/certificates/errs.Wrapf
/go/pkg/mod/github.com/smallstep/[email protected]/errs/error.go:122
github.com/smallstep/certificates/ca.(*Client).Health
/go/pkg/mod/github.com/smallstep/[email protected]/ca/client.go:612
github.com/smallstep/cli/command/ca.healthAction
/src/command/ca/health.go:79
github.com/urfave/cli.HandleAction
/go/pkg/mod/github.com/urfave/[email protected]/app.go:526
github.com/urfave/cli.Command.Run
/go/pkg/mod/github.com/urfave/[email protected]/command.go:174
github.com/urfave/cli.(*App).RunAsSubcommand
/go/pkg/mod/github.com/urfave/[email protected]/app.go:407
github.com/urfave/cli.Command.startApp
/go/pkg/mod/github.com/urfave/[email protected]/command.go:373
github.com/urfave/cli.Command.Run
/go/pkg/mod/github.com/urfave/[email protected]/command.go:102
github.com/urfave/cli.(*App).Run
/go/pkg/mod/github.com/urfave/[email protected]/app.go:279
main.main
/src/cmd/step/main.go:98
runtime.main
/usr/local/go/src/runtime/proc.go:225
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1371
I use traefik as a reverse proxy in front of step-ca. It is set up as a simple TCP relay, and is letting the step-ca container handle all of the TLS itself. This was working perfectly in version 0.15.4 (BEFORE the healthcheck was added). For those unfamiliar, Traefik simply looks at docker labels in order to know how to route to the container and do all of its proxying, due to its simplicity it has become a very popular choice among docker swarm and Kubernetes stacks.
TLDR; The problem is that Traefik will not set up the routing until the Healthcheck is passed, however the healthcheck relies on dns resolution and any proxy configuration to already be set up correctly and working in order to succeed.
Is there some way we can disable the health check for more compatibility with a setup like mine? Another thought I had was to perhaps change the healthcheck to check against localhost instead of the configured domain we give it.
@tomdaley92 the problem with your health check is that the dnsNames in the ca.json should have ca.diesel.net too.
Looking at the error it looks like the domain 9721f7d721878f7496b87c17dcab760d.2868b98699e09c78a80c69bee273ddd8.traefik.default is in the one in the ca.json; either that or traefik is decoding the TLS instead of passing the TCP through step-ca.
Hi @tomdaley92,
The health check just runs step ca health, which uses the CA url and fingerprint configured in /home/step/config/defaults.json in the container. To get the health check working on your setup, change the CA URL to https://localhost or whatever value will reach the CA directly instead of Traefik. Let me know if this works for you. :D
@tomdaley92 the problem with your health check is that the dnsNames in the ca.json should have
ca.diesel.nettoo.Looking at the error it looks like the domain
9721f7d721878f7496b87c17dcab760d.2868b98699e09c78a80c69bee273ddd8.traefik.defaultis in the one in the ca.json; either that or traefik is decoding the TLS instead of passing the TCP through step-ca.
Right, so this is exactly my point. Since I have a DNS record pointing to the VM that the proxy lives on, Traefik is throwing up the self signed default certificate since it is unable to do the tcp routing because it views the container as unhealthy. This is like the classic which came first "chicken or the egg" problem haha. Again, when using the older version without the health check traefik picks up the configuration and pass the tcp connection through to the container
Hi @tomdaley92,
The health check just runs
step ca health, which uses the CA url and fingerprint configured in/home/step/config/defaults.jsonin the container. To get the health check working on your setup, change the CA URL tohttps://localhostor whatever value will reach the CA directly instead of Traefik. Let me know if this works for you. :D
I will try this, but I would assume the domain I feed step-ca with is needed for it to know what certificate to generate/serve when a client hits https://ca.diesel.net for example. If it generates a certificate for localhost and I come in on ca.diesel.net that's gonna cause problems. Maybe I'm missing something so I'll go ahead and give it a try and thank you for the quick reply as well!
@tashian no luck with setting localhost during step ca init. I even tried adding localhost,ca.diesel.net,127.0.01 with no luck either.
Here is my output:
docker run -it -e STEPDEBUG=1 smallstep/step-ca:0.16.0 sh
~ $ step ca init
What would you like to name your new PKI?
β (e.g. Smallstep): Diesel
What DNS names or IP addresses would you like to add to your new CA?
β (e.g. ca.smallstep.com[,1.1.1.1,etc.]): localhostβ
What IP and port will your new CA bind to?
β (e.g. :443 or 127.0.0.1:4343): :443β
What would you like to name the CA's first provisioner?
β (e.g. [email protected]): [email protected]
Choose a password for your CA keys and first provisioner.
β [leave empty and we'll generate one]:
β Password: b^MAT=f9<v=c$IMzRBz[!253V/,k;u7C
Generating root certificate...
all done!
Generating intermediate certificate...
all done!
β Root certificate: /home/step/certs/root_ca.crt
β Root private key: /home/step/secrets/root_ca_key
β Root fingerprint: a19b0ce1f59fc67f36e362675f77e8c46687e4bbe9e66d3ca8439b45159b1d07
β Intermediate certificate: /home/step/certs/intermediate_ca.crt
β Intermediate private key: /home/step/secrets/intermediate_ca_key
β Database folder: /home/step/db
β Default configuration: /home/step/config/defaults.json
β Certificate Authority configuration: /home/step/config/ca.json
Your PKI is ready to go. To generate certificates for individual services see 'step help ca'.
FEEDBACK π π»
The step utility is not instrumented for usage statistics. It does not
phone home. But your feedback is extremely valuable. Any information you
can provide regarding how youβre using `step` helps. Please send us a
sentence or two, good or bad: [email protected] or join
https://github.com/smallstep/certificates/discussions.
~ $
~ $ step ca health
Get "https://localhost/health": dial tcp 127.0.0.1:443: connect: connection refused
client.Health; client GET https://localhost/health failed
github.com/smallstep/certificates/errs.Wrapf
/go/pkg/mod/github.com/smallstep/[email protected]/errs/error.go:122
github.com/smallstep/certificates/ca.(*Client).Health
/go/pkg/mod/github.com/smallstep/[email protected]/ca/client.go:612
github.com/smallstep/cli/command/ca.healthAction
/src/command/ca/health.go:79
github.com/urfave/cli.HandleAction
/go/pkg/mod/github.com/urfave/[email protected]/app.go:526
github.com/urfave/cli.Command.Run
/go/pkg/mod/github.com/urfave/[email protected]/command.go:174
github.com/urfave/cli.(*App).RunAsSubcommand
/go/pkg/mod/github.com/urfave/[email protected]/app.go:407
github.com/urfave/cli.Command.startApp
/go/pkg/mod/github.com/urfave/[email protected]/command.go:373
github.com/urfave/cli.Command.Run
/go/pkg/mod/github.com/urfave/[email protected]/command.go:102
github.com/urfave/cli.(*App).Run
/go/pkg/mod/github.com/urfave/[email protected]/app.go:279
main.main
/src/cmd/step/main.go:98
runtime.main
/usr/local/go/src/runtime/proc.go:225
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1371
Again, that's just an adhoc command to debug, but my actual stack is on docker swarm:
https://github.com/Diesel-Net/step-ca https://github.com/Diesel-Net/traefik
If I replace ca.diesel.net with localhost in defaults.json AFTER step ca init as a sort of override, and then redeploy the service, it is able to resolve localhost to 127.0.0.1 inside the container, but still fails due to a bad certificate, which is what I expected.

Again thanks for the help everyone, don't mean to blow up this thread. Just posting my findings.
It looks like Traefik doesn't have a way to "not require" health checks in order to set up proxy configurations either
https://github.com/traefik/traefik/issues/7732
Looks like my only option might be to either enable TLS termination on traefik (but traefik uses Step-ca as acme client so another chicken and egg problem) or to build a custom docker image without the healthcheck.
It would be awesome if the healthcheck was made configurable but I can see why this might dismissed pretty quick
The name in the health check URL has to match a name in dnsNames in ca.json. So, use ca.diesel.net,127.0.0.1,localhost in ca.json, and then change defaults.json to use localhost (or 127.0.0.1).
When you got connect: connection refused above, it looks like you hadn't yet started the step-ca server.
The name in the health check URL has to match a name in
dnsNamesinca.json. So, useca.diesel.net,127.0.0.1,localhostinca.json, and then changedefaults.jsonto uselocalhost(or127.0.0.1).When you got
connect: connection refusedabove, it looks like you hadn't yet started thestep-caserver.
Ahh I will try that, thanks again. FYI that command output connect: connection refused is all running inside the container hence the -it flag on the docker run command. Is there some other step command I'm supposed to run after step ca init in order to "start" the server?
Wewt that was it @tashian I now have a succesfull healthcheck! thank you taking time out of your day to help me with this, really appreciate it π
Happy to help. The command to start step-ca in the container is /usr/local/bin/step-ca --password-file $PWDPATH $CONFIGPATH (see the Dockerfile's CMD line)
@tomdaley92 Ok I see what's going on with your last output, step-ca is not running.
Looking at your ansible configuration in your github, you're mounting a pre-created configuration, good. So to imitate this using docker run, you need to pre-create the configuration with step ca init, and make sure that the paths in ca.json and defaults.json point to /home/step/* instead of your local path.
Then start the ca with the volume mounted, using the default command and running the health check:
docker run --mount type=bind,source="/tmp/docker",target=/home/step -it -e STEPDEBUG=1 smallstep/step-ca:0.16.0
And in another terminal, exec in and try to health:
$ docker exec -it 10ce907bea0e sh
~ $ ps
PID USER TIME COMMAND
1 step 0:00 /usr/local/bin/step-ca --password-file /home/step/secrets/password /home/step/config/ca.json
47 step 0:00 sh
55 step 0:00 ps
~ $ step ca health
ok
And if you look at the output of the step-ca, you will see that health check (the one in the docker file) is running every 30s
Follow up with recent smallstep/step-ca:0.20.0: After starting the container, for quite a time its status is running (starting) and then turns to running (unhealthy).
I ran into the same DNS resolution problem when using docker swarm. Instead of modifying ca.json and defaults.json, I used the extra_hosts service option to provide DNS resolution in the container of ca.diesel.net to 127.0.0.1. This is my stack compose file. Note that it takes around 30 seconds for the service to finish coming up.
version: '3.4'
networks:
step:
external: true
volumes:
step_home_step:
external: true
services:
step:
extra_hosts:
- 'ca.diesel.net:127.0.0.1'
image: smallstep/step-ca:0.20.0
networks:
- step
volumes:
- source: step_home_step
target: /home/step
type: volume
volume:
nocopy: true
@sunvalleyfoods that's a nice approach!
I'm going to convert this into a discussion so people can find it for posterity.
@strarsis if you're still encountering this issue, could you please open a new issue and provide some more context about your deployment of step-ca?