bitte icon indicating copy to clipboard operation
bitte copied to clipboard

Routing token fix

Open johnalotoski opened this issue 3 years ago • 0 comments
trafficstars

Why:

  • This will fix the 32 day consul related traefik failures which occur on routing due to the consul secret engine of vault not having periodic tokens (ie: renewable tokens past the default system max TTL). Errors from this failure on router appear like the following examples:
# Consul service:

  [INFO]  agent.http: Request cancelled: method=GET url=/v1/agent/connect/ca/roots?dc=eu-central-1&index=12 from=127.0.0.1:44784 error="context canceled"
  [ERROR] agent.client: RPC failed to server: method=ConnectCA.Sign server=172.16.2.10:8300 error="rpc error making call: rpc error making call: ACL not found"
  [ERROR] agent.http: Request error: method=GET url=/v1/agent/connect/ca/leaf/traefik?dc=eu-central-1&index=2248521 from=127.0.0.1:44796 error="rpc error making call: rpc error making call: ACL not found"
  [ERROR] agent.client: RPC failed to server: method=Catalog.ListServices server=172.16.1.10:8300 error="rpc error making call: rpc error making call: ACL not found"
  [ERROR] agent.http: Request error: method=GET url=/v1/catalog/services?consistent=&dc=eu-central-1 from=127.0.0.1:44798 error="rpc error making call: rpc error making call: ACL not found"
  [INFO]  agent.http: Request cancelled: method=GET url=/v1/agent/connect/ca/roots?dc=eu-central-1&index=12 from=127.0.0.1:44792 error="context canceled"
  [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=172.16.2.10:8300 error="rpc error making call: rpc error making call: ACL not found"
  [ERROR] agent: Coordinate update error: error="rpc error making call: rpc error making call: ACL not found"
  [ERROR] agent.http: Request error: method=GET url=/v1/catalog/services?consistent=&dc=eu-central-1 from=127.0.0.1:44802 error="rpc error making call: rpc error making call: ACL not found"
  [INFO]  agent.http: Request cancelled: method=GET url=/v1/agent/connect/ca/leaf/traefik?dc=eu-central-1&index=2248521 from=127.0.0.1:44796 error="context canceled"
  [INFO]  agent.http: Request cancelled: method=GET url=/v1/agent/connect/ca/roots?dc=eu-central-1&index=12 from=127.0.0.1:44798 error="context canceled"
  [ERROR] agent.client: RPC failed to server: method=ConnectCA.Sign server=172.16.1.10:8300 error="rpc error making call: rpc error making call: ACL not found"

# Traefik service:
{"@level":"error","@message":"Watch errored","@module":"consulcatalog.watch","@timestamp":"2022-08-15T13:49:06.113669Z","error":"Unexpected response code: 403 (rpc error making call: ACL not found)","retry":5000000000,"type":"connect_leaf"}
  time="2022-08-15T13:49:06Z" level=error msg="Provider connection error failed to get consul catalog data: Unexpected response code: 403 (rpc error making call: ACL not found), retrying in 6.95333972s" providerName=consulcatalog
...
  • Adds static traefik-consul-token generation, updated static kv paths, updated respective policies and corresponding utilization of consul static tokens.

Migration:

  • Metal deploy core-1 to update vault policies and push new static tokens to their associated kv paths. Services will take a minute or so to restart successfully, so you may wish to use the no-auto-rollback, no-magic-rollback features of bitte deploy to avoid premature rollback.
  • Metal deploy routing after core-1 deployment. Ensure that vault-agent, consul, traefik have been restarted successfully after the deployment.
  • For the few clusters which have additional hydrate-cluster policies for the routing role beyond the standard metal definitions, a hydrate-cluster plan/apply will need to be done.

johnalotoski avatar Aug 16 '22 01:08 johnalotoski