boulder icon indicating copy to clipboard operation
boulder copied to clipboard

Support using an ALB between VA and RVA

Open mcpherrinm opened this issue 4 months ago • 2 comments

We have a bit of fragility between the VA and RVA because of our use of DNS to locate RVA servers. As well, our private PKI needs to be stretched out from our secure DCs to the cloud environment, which causes excess complexity.

We'd like to fix both of these problems at once, by using an AWS Application Load Balancer (ALB) between the VAs and RVAs.
ALBs can load balance grpc, and can validate the incoming mTLS connection. The ALB will have a Web PKI certificate issued by AWS.

We need two changes from Boulder to make this happen:

  • [ ] In the RVA, support exposing the RPCs without requiring client certificate authentication
  • [ ] In the VA, allow specifying a different CA trust store for outbound connections (or potentially to use the system trust store)

The first is more important than the 2nd, as we can manually install a private PKI certificate in the ALB, but we'd rather not have to write automation for that.

Having a general option to run the RVAs unauthenticated is somewhat high-risk for other services, so we may want that to block on https://github.com/letsencrypt/boulder/issues/5294 so that the setting can be self-contained only in the RVAs.

mcpherrinm avatar Mar 15 '24 18:03 mcpherrinm

We previously removed our use of AWS Network Load Balancer (NLB) between our RVAs and Unbounds. Have we ever used NLB between the VAs and the RVAs? If we did, would that let us dynamically route traffic to all of our RVAs without having to let the load-balancer terminate TLS?

aarongable avatar Mar 19 '24 18:03 aarongable

Using an NLB as a layer-4 proxy is an option, but it won't work as well as an ALB:

  • we can't use it for autoscaling based on request volume, which would be a big improvement for our autoscaling policies
  • it load balances connections rather than RPCs, so we can't gracefully remove instances without terminating in-flight RPCs
  • we can't use the ALB's grpc health check features to ensure the instance is properly up and running

mcpherrinm avatar Mar 19 '24 20:03 mcpherrinm