tofu-controller icon indicating copy to clipboard operation
tofu-controller copied to clipboard

Support generic health check

Open chanwit opened this issue 2 years ago • 3 comments

From @phoban01 on #44

Might want to consider adding a successThreshold and failureThreshold to avoid transient network errors triggering health-check failures?

Generally, the application of tcp/http healthchecks could be limited, as it might often be the case that there's no network connectivity between the resources being created and the tf-controller. If outputs could be passed to exec then something like the following would be more practical:

healthChecks:
- name: bucket
  exec: "aws s3api head-bucket --bucket {{ outputs.bucket_name }}"

chanwit avatar Jan 21 '22 11:01 chanwit

This will require the controller image to have cli/tools installed to run. TerraformRunner can help with this by allowing users to customize their runner image (feature?).

tomhuang12 avatar Jan 24 '22 19:01 tomhuang12

We'll decide about this one later, as it might compromise our security.

chanwit avatar Jan 25 '22 07:01 chanwit

Thinking a bit more about this, I think an event-driven approach might be a good way to approach generic health-checks. For example, in the context of AWS I can subscribe to an event-bridge bus and listen to resource creation events; the flux notification controller could be configured to receive these events and forward to the terraform controller. This approach allows us to avoid any cloud provider specific implementations from the terraform controller.

We still need some logic around mapping events to terraform outputs, defining the health check and tracking it's state. The flux notification controller doesn't have a receiver for event-bridge or pub/sub but this is something we could add.

phoban01 avatar Jan 27 '22 10:01 phoban01