vault
vault copied to clipboard
Support for AWS NLB "TCP Pings"
Is your feature request related to a problem? Please describe. We are using an AWS Network Load Balancer in front of Vault, which does the TLS termination, then connect to Vault instances using TLS as well, the health check is an HTTPS one on the /v1/sys/health. Everything works perfectly but our Vault logs are flooded by messages of this type:
Sep 13 08:04:29 vault02 vault[10592]: 2019-09-13T08:04:29.412Z [INFO] http: TLS handshake error from 10.10.1.27:54625: EOF
After thorough investigations, our best hypothesis is that they are due to AWS NLB "TCP pings" described here https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
If you add a TLS listener to your Network Load Balancer, we perform a listener connectivity test. As TLS termination also terminates a TCP connection, a new TCP connection is established between your load balancer and your targets. Therefore, you might see the TCP pings for this test sent from your load balancer to the targets that are registered with your TLS listener. You can identify these TCP pings because they have the source IP address of your Network Load Balancer and the connections do not contain data packets.
Describe the solution you'd like Silently ignore those "TCP pings" (or at least have an option to do so) as Vault users could think something is wrong while everything is actually fine (plus it's flooding logs)
Describe alternatives you've considered Ignoring the warnings as they seem harmless
I'm suspicious that there's an issue with the way the load balancer is configured to hit the Vault health endpoint. The error doesn't originate from Vault itself, but from one of Go's built-in libraries. There are many posts regarding the issue, this for example.
Would you be willing to share a couple more things?
- Your Vault configuration
- Your ELB healthcheck configuration for hitting Vault
- Can you confirm you've configured certificates, if needed, as described here under "Step 3: Configure Security Settings"?
If you're still receiving that message after checking all that through, that should give us sufficient steps to reproduce the log line.
Thank you!
Hello @tyrannosaurus-becks , thanks a lot for your answer. I'm happy to share whatever can help:
My vaut config:
{
"ui": true,
"pid_file": "/run/vault/vault.pid",
"storage": {
"consul": {
"address": "unix:///var/local/consul/consul.sock"
}
},
"listener": {
"tcp": {
"address": "0.0.0.0:8200",
"tls_cert_file": "/etc/vault.d/server.cert",
"tls_key_file": "/etc/vault.d/server.key"
}
},
"seal": {
"awskms": {
"region": "eu-west-1",
"kms_key_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}
}
My LB config (terraform config, which I think is worth many words ;) )
resource "aws_lb" "vault" {
name = "${var.project_name}-vault-nlb"
internal = true
load_balancer_type = "network"
subnets = "${aws_subnet.main.*.id}"
enable_cross_zone_load_balancing = true
tags = var.tags
}
resource "aws_lb_target_group" "vault" {
name = "${var.project_name}-vault-nlb-tg"
port = 8200
protocol = "TLS"
vpc_id = "${aws_vpc.vpc.id}"
target_type = "instance"
health_check {
path = "/v1/sys/health"
port = "traffic-port"
protocol = "HTTPS"
enabled = true
healthy_threshold = 2
unhealthy_threshold = 2
}
tags = var.tags
}
resource "aws_acm_certificate" "certificate" {
domain_name = "${var.domain_name}"
validation_method = "DNS"
tags = var.tags
lifecycle {
create_before_destroy = true
}
}
resource "aws_lb_listener" "vault" {
load_balancer_arn = "${aws_lb.vault.arn}"
port = "443"
protocol = "TLS"
ssl_policy = "ELBSecurityPolicy-2016-08"
certificate_arn = "${aws_acm_certificate.certificate.arn}"
default_action {
type = "forward"
target_group_arn = "${aws_lb_target_group.vault.arn}"
}
}
resource "aws_vpc_endpoint_service" "vault" {
acceptance_required = true
network_load_balancer_arns = ["${aws_lb.vault.arn}"]
tags = merge(var.tags, {
"Name" = "${var.project_name}_vault_vpces"
})
}
Wouldn't setting HealthCheckProtocol to HTTPS fix this problem?
Isn't it already the case ? (see terraform config above)
Maybe? Worth checking on AWS console probably.
I checked in the AWS console, terraform works properly and the protocol for the healthcheck is HTTPS as configured in the tf file above. I assume the healthcheck would have failed if it was not done using HTTPS anyway (as noted, everything is working fine)
Hi @forty, I came across this project: https://github.com/jen20/vault-health-checker. Wanted to share in case you still need a solution.
@kwilczynski - that's what we did as well and it works nicely.
Hi @ftcjeff, nice! Thank you for letting me know!
I am sure that @jen20 will be happy to know that his project solves this problem so nicely! It's great.
@kwilczynski the project you mentioned states:
Unfortunately, the AWS NLB does not support HTTP health checks, instead supporting only TCP checks. While TCP checks can be pointed at a Vault server, they cannot determine the actual health of the instance, and fill the logs of the Vault server with spam related to unencrypted requests.
Ideally, the NLB will eventually support HTTP health checks and this project will become obsolete.
which is incorrect. My vault NLB is configured to do HTTPS health checks (as you can see in my TF config above), and I still have this issue.
Running into the same issue here. It'd be nice if there was a flag or something that we can set to ignore tls warnings or even whitelist specific CIDR blocks or IP addresses. Or maybe make these types of warnings DEBUG level rather than INFO level.
@forty It was correct at the time it’s was written (see the date stamp on the README!), but may no longer be.
https://github.com/golang/go/issues/26918
This is still a problem in Ali Cloud, and SLBs there don't support HTTPS health checks. Just TCP and HTTP. Wouldn't it be possible to trap that log and move it to Debug instead of Info?
when using NLB's you need to have your ec2 instances allow the SUBNET CIDR's, you cant grant them access from using the subnet id that is attached to the NLB
Also the second problem is, the endpoint /v1/sys/health?standbyok is what you want HOWEVER if your vault is sealed they will return 404 and when using an NLB your HTTP/s health checks MUST return 200, theres a parameter called "matcher" which allows you to set what the valid http response codes are HOWEVER that parameter is not allowed with NLB's
Hope that helps
As of v1.12.0 this issue is still happening and flooding logs.