consul icon indicating copy to clipboard operation
consul copied to clipboard

Make HCP Link controller run periodically (once per minute)

Open NickCellino opened this issue 1 year ago • 1 comments

Description

This change addresses 3 inconvenient aspects of the UX of using the HCP Link API:

  1. Previously, when converting a cluster from read-only to read-write in HCP, you would have to issue an additional Write to the Link resource in order to cause the Reconcile loop to run again and update the link.
  2. Previously, if something happened to cause the connection to HCP to be broken (eg the HCP cluster was deleted or internet connectivity is lost), the link would remain in the LINKED connection status, thus giving the user a false impression that everything is still working.

This PR addresses both of those by making the Link Controller run periodically (once per minute). With this change, the LinkController will pick up any updates that happen in HCP automatically since it queries HCP every time it runs. Additionally, if anything goes wrong with its connection to HCP, the Link status will be appropriately updated to reflect this.

1 minute was chosen as the period length as a tradeoff between UX and performance. The main aspect of the UX I'm concerned about is changing a cluster from read-only to read-write in HCP. As a user, I think I would suspect something was broken if this did not start working within a minute. Happy to discuss if others feel differently!

Testing & Reproduction steps

Manual test

I did a manual test to:

  1. Verify that updating the cluster from read-only to read-write in the HCP portal is eventually synced to the link resource.

  2. Verify that the link status eventually reflects an error with the connection to HCP

  3. I created a cluster in HCP and linked it to my self managed cluster.

  4. Verified the Link exists in Consul cluster and has READ_ONLY access mode:

➜  consul git:(nickcellino/link-status) http GET localhost:8500/api/hcp/v2/link/global X-Consul-Token:$CONSUL_HTTP_TOKEN
HTTP/1.1 200 OK
Content-Length: 1489
Content-Type: text/plain; charset=utf-8
Date: Thu, 22 Feb 2024 20:45:45 GMT

{
    "data": {
        "accessLevel": "ACCESS_LEVEL_GLOBAL_READ_ONLY",
        ...
    },
    ...
} 
  1. Updated cluster in HCP to read-write
  2. Waited a little while (~1 minute), read the link again, and verified it has READ_WRITE access mode:
➜  consul git:(nickcellino/link-status) http GET localhost:8500/api/hcp/v2/link/global X-Consul-Token:$CONSUL_HTTP_TOKEN
HTTP/1.1 200 OK
Content-Length: 1490
Content-Type: text/plain; charset=utf-8
Date: Thu, 22 Feb 2024 20:50:08 GMT

{
    "data": {
        ...
        "resourceId": "organization/d95434ef-1f14-4b17-b23e-1eb608bb9fda/project/e263159f-b9ea-468e-8dc5-ee3d23403fe2/hashicorp.consul.global-network-manager.cluster/test-ro-cluster"
    },
    "status": {
        "consul.io/hcp/link": {
            "conditions": [
                {
                    "message": "Successfully validated link",
                    "reason": "SUCCESS",
                    "state": "STATE_TRUE",
                    "type": "validated"
                },
                {
                    "message": "Successfully linked to cluster 'organization/d95434ef-1f14-4b17-b23e-1eb608bb9fda/project/e263159f-b9ea-468e-8dc5-ee3d23403fe2/hashicorp.consul.global-network-manager.cluster/test-ro-cluster'",
                    "reason": "SUCCESS",
                    "state": "STATE_TRUE",
                    "type": "linked"
                }
            ],
            "observedGeneration": "01HQ999KNC3BCHT5M0NYE507CZ",
            "updatedAt": "2024-02-22T20:49:44.254775Z"
        }
    },

    ...
}
  1. Turned off WiFi on my laptop to simulate a network connectivity error
  2. Waited ~1 minute, read the link again, and verified that the linked status is now STATE_FALSE:
➜  consul git:(nickcellino/link-status) http GET localhost:8500/api/hcp/v2/link/global X-Consul-Token:$CONSUL_HTTP_TOKEN
HTTP/1.1 200 OK
Content-Length: 1344
Content-Type: text/plain; charset=utf-8
Date: Thu, 22 Feb 2024 20:57:18 GMT

{
    ...
    "status": {
        "consul.io/hcp/link": {
            "conditions": [
                {
                    "message": "Successfully validated link",
                    "reason": "SUCCESS",
                    "state": "STATE_TRUE",
                    "type": "validated"
                },
                {
                    "message": "Failed to link to HCP due to unexpected error",
                    "reason": "FAILED",
                    "state": "STATE_FALSE",
                    "type": "linked"
                }
            ],
            "observedGeneration": "01HQ999KNC3BCHT5M0NYE507CZ",
            "updatedAt": "2024-02-22T20:56:44.968828Z"
        }
    },
    "version": "48"
}

Links

PR Checklist

  • [ ] updated test coverage
  • [ ] external facing docs updated
  • [ ] appropriate backport labels added
  • [ ] not a security concern

NickCellino avatar Feb 22 '24 20:02 NickCellino

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

github-actions[bot] avatar Apr 23 '24 01:04 github-actions[bot]

Closing due to inactivity. If you feel this was a mistake or you wish to re-open at any time in the future, please leave a comment and it will be re-surfaced for the maintainers to review.

github-actions[bot] avatar May 23 '24 01:05 github-actions[bot]