nomad icon indicating copy to clipboard operation
nomad copied to clipboard

autopilot stats_fetcher gets mTLS errors sending cross-region RPCs

Open tgross opened this issue 3 years ago • 0 comments

@ron-savoia reported getting the following errors with a federated clusters using mTLS:

2022-10-28T20:33:32.124Z [WARN] nomad.rpc: failed TLS handshake: remote_addr=192.168.1.190:55376 error="remote error: tls: bad certificate" 2022-10-28T20:33:32.755Z [WARN] nomad.stats_fetcher: error getting server health: server=server1.west error="rpc error: failed to get conn: x509: certificate is valid for server.west.nomad, localhost, not server.east.nomad"

But normal federated operations such as submitting a job across regions works as expected, so the mTLS configuration itself is fine. The stats_fetcher code was updated when we did the autopilot update in 1.4.0 in https://github.com/hashicorp/nomad/pull/14441. The RPCs involved here should be doing the same cross-region checks as other federated RPCs. But taking a quick look at the code, it's not clear to me that we should be making cross-region stats-fetcher requests at all here either.

tgross avatar Oct 31 '22 14:10 tgross