consul-template icon indicating copy to clipboard operation
consul-template copied to clipboard

[windows]: Kill failed: `"not supported by windows"`

Open komapa opened this issue 2 years ago • 9 comments

Consul Template version

2022-07-22T06:56:28.852Z [INFO] consul-template v0.29.1 (4525703)

Configuration

log_level = "trace"

# Setting this value to the empty string will cause consul-template
# to not listen for any reload signals but pass them down to nginx instead.
reload_signal = ""

consul {
  retry {
    enabled = true
    attempts = 10
    max_backoff = "1s"
  }
}

vault {
  retry {
    enabled = true
    attempts = 10
    max_backoff = "1s"
  }
}

template {
  destination = "secrets/ssl/keys/connect-ca.pem"
  contents = <<EOC
{{ range caRoots }}{{ .RootCertPEM }}{{ end }}
EOC
}

template {
  destination = "secrets/ssl/keys/connect-client.pem"
  contents = <<EOC
{{ with caLeaf "connect-ingress" }}{{ .CertPEM }}{{ end }}
EOC
}

template {
  destination = "secrets/ssl/keys/connect-client.key"
  contents = <<EOC
{{ with caLeaf "connect-ingress" }}{{ .PrivateKeyPEM }}{{ end }}
EOC
}

template {
  source      = "templates/certificates.hcl.ctmpl"
  destination = "secrets/certificates.hcl"
  exec {
    command = ["consul-template", "-once", "-config", "secrets/certificates.hcl"]
    timeout = "30s"
  }
}

template {
  source      = "templates/ingress.conf.ctmpl"
  destination = "local/nginx/stream.d/ingress.conf"
}

template {
  source      = "templates/ingress-upstreams.conf.ctmpl"
  destination = "local/nginx/conf.d/ingress-upstreams.conf"
}

template {
  source      = "templates/http-status.conf.ctmpl"
  destination = "local/nginx/conf.d/http-status.conf"
}

exec {
  # We are setting the kill_timeout to the max 5m that Nomad is going to wait before killing consul-template itself
  # This way we allow Nomad to be the one controlling the kill_timeout and not consul-template
  kill_timeout = "5m"

  # This is a random splay to wait before killing the command. The default
  # value is 0 (no wait), but large clusters should consider setting a splay
  # value to prevent all child processes from reloading at the same time when
  # data changes occur. When this value is set to non-zero, Consul Template
  # will wait a random period of time up to the splay value before reloading
  # or killing the child process. This can be used to prevent the thundering
  # herd problem on applications that do not gracefully reload.
  splay = "30s"

  # It is very important to use the "array" syntax when defining parameters to the command,
  # otherwise consul-template does not pass signals correctly to the spawned process.
  # See: https://github.com/hashicorp/consul-template#multiple-commands
  command = ["nginx"]
}

Command

CMD ["consul-template", "-config", "C:/config.hcl"]

Debug output

https://gist.github.com/kirooshu/886104e80c8b6c185a803370777803b6

Expected behavior

In this case I would have expected consul-template to be able to terminate the nginx process and restart.

Actual behavior

2022-07-22T17:11:11.166Z [ERR] (child) Kill failed: OpenProcess: The parameter is incorrect.

komapa avatar Jul 22 '22 17:07 komapa

Hey @kirooshu, thanks for the report.

This has to come from one of the 2 calls in this block. The setpgid bit is the only change in recent versions that might have triggered this issue.

https://github.com/hashicorp/consul-template/blob/ae9402466397618b825858e75a00ebb04587f228/child/child.go#L389-L399

This is probably fixed on HEAD as I've already changed this on the main branch to only use setpgid for a specific kind of sub-process that you won't have in a Windows environment.

TLDR; I think this is already fixed on the main branch. Can you build it yourself to test? If not I could probably get you a binary of it.

Thanks.

eikenb avatar Jul 25 '22 19:07 eikenb

Thank you. For us, https://github.com/hashicorp/consul-template/issues/1428 is more important to actually make sure consul-template does not even kill the nginx process. If you get me a windows build of latest, I can definitely try it and confirm that it is fixed in practice. I can also test when you release the new version as well, unless you think that is too late to confirm with a release?

komapa avatar Jul 26 '22 06:07 komapa

I'm working on a 0.29.2 bugfix release now (unrelated to this, but forcing a priority change) and will be releasing that in the next few weeks. I'm going to go with my guess that this is fixed already by the other changes and resolve this as part of that release. If for some reason it is still around we can always reopen this. Thanks.

Fixed by #1600

eikenb avatar Aug 01 '22 22:08 eikenb

Hello @eikenb, unfortunately the error now switched to this:

2022-09-29T21:07:22.887Z [INFO] (child) reloading process
2022-09-29T21:07:37.327Z [ERR] (cli) 1 error occurred:
	* not supported by windows

I even tried to force to a known signal, same problem:

CMD ["consul-template", "-config", "C:/config.hcl", "-exec-kill-signal", "SIGTERM", "-exec-reload-signal", "SIGTERM"]

Any ideas?

komapa avatar Sep 29 '22 21:09 komapa

Hey @kirooshu... I'm not sure what's up with that. Windows supports SIGTERM and I can't find any issues eyeballing the code. Any chance of a conflicting setting in that config.hcl file?

eikenb avatar Sep 30 '22 19:09 eikenb

This is the config on disk:

C:\>type config.hcl
log_level = "info" 

# Setting this value to the empty string will cause consul-template
# to not listen for any reload signals but pass them down to nginx instead.   
reload_signal = ""

consul {
  retry {
    enabled = true
    attempts = 10
    max_backoff = "1s"
  }
}

vault {
  retry {
    enabled = true
    attempts = 10
    max_backoff = "1s"
  }
}

template {
  destination = "secrets/ssl/keys/connect-ca.pem"
  contents = <<EOC
{{ range caRoots }}{{ .RootCertPEM }}{{ end }}
EOC
}

template {
  destination = "secrets/ssl/keys/connect-client.pem"
  contents = <<EOC
{{ with caLeaf "connect-ingress" }}{{ .CertPEM }}{{ end }}
EOC
}

template {
  destination = "secrets/ssl/keys/connect-client.key"
  contents = <<EOC
{{ with caLeaf "connect-ingress" }}{{ .PrivateKeyPEM }}{{ end }}
EOC
}

template {
  source      = "templates/certificates.hcl.ctmpl"
  destination = "secrets/certificates.hcl"
  exec {
    command = ["consul-template", "-once", "-config", "secrets/certificates.hc
l"]
    timeout = "30s"
  }
}

template {
  source      = "templates/http-status.conf.ctmpl"
  destination = "local/nginx/conf.d/http-status.conf"
}

template {
  source      = "templates/ingress.conf.ctmpl"
  destination = "local/nginx/stream.d/ingress.conf"
}

template {
  source      = "templates/ingress-upstreams.conf.ctmpl"
  destination = "local/nginx/conf.d/ingress-upstreams.conf"
}

exec {
  # We are setting the kill_timeout to the max 5m that Nomad is going to wait 
before killing consul-template itself
  # This way we allow Nomad to be the one controlling the kill_timeout and not
 consul-template
  kill_timeout = "5m"

  # data changes occur. When this value is set to non-zero, Consul Template
  # will wait a random period of time up to the splay value before reloading
  # or killing the child process. This can be used to prevent the thundering
  # herd problem on applications that do not gracefully reload.
  splay = "30s"

  # It is very important to use the "array" syntax when defining parameters to the command,
  # otherwise consul-template does not pass signals correctly to the spawned process.
  # See: https://github.com/hashicorp/consul-template#multiple-commands
  command = ["nginx"]
}

and confirming this is what is in the Dockerfile: CMD ["consul-template", "-config", "C:/config.hcl", "-exec-kill-signal", "SIGTERM", "-exec-reload-signal", "SIGTERM"]

We run OpenResty 1.21.x on a nanoserver-windows-1809

komapa avatar Oct 18 '22 18:10 komapa

I tried to use windows:servercore-ltsc2019 as a base image and the problem was the same.

komapa avatar Oct 18 '22 18:10 komapa

Forgot to reopen... doing that and updating the title.

eikenb avatar Nov 07 '22 23:11 eikenb

We would still love to find a solution here if possible!

komapa avatar Jan 27 '23 23:01 komapa