Add architecture documentation supporting users and security teams
Thanks for sharing the tf-controller with the community, this is a awesome component !
expected behavior
As a tf-controller user, In order to understand/troubleshoot behavior of tf-controller (see related https://github.com/weaveworks/tf-controller/issues/502#issuecomment-1434535744), I need architecture documentation to answer high level questions such as:
- what are the lifecycle of the run pods created by tf-controller, and the GRPC dialogue between tf-controller and runner pod
- where do the terraform specs transit (network path, storing in memory, in disk, etc)
- where does the tf.state file data transit for the default different bac(network path, storing in memory, in disk, etc)
As a security team, In order to better understand the responsibility and scopes of the tf controller components, I need a high level architecture documents similar as the one suggested in https://ostif.org/wp-content/uploads/2021/11/FluxreportFinalV1.1.pdf
4.1.1 Recommendation An improvement in this context would be to have clarification on end-to-end processes in Flux, similar to how Envoy Proxy has an “life of an event” documentation: https://www.envoyproxy.io/docs/envoy/v1.19.1/intro/life_of_a_request From a security perspective such an overview would highly improve the understanding of what trust boundaries Flux assumes and also describe the threat model of Flux. In such end-to-end documentation, it would be of high value to make it clear how the individual components relate to each other as well as describe where authentication and hardening procedures are in place.
current behavior
https://docs.gitops.weave.works/docs/terraform/terraform-intro/ has a functional overview including features, docs here https://weaveworks.github.io/tf-controller/use_tf_controller/ focus on how to use the product
https://github.com/weaveworks/tf-controller/blob/main/docs/getting_started.md#preflight-checks includes some minimal architecture description
TF-controller uses the Controller/Runner architecture. The Controller acts as a client, and talks to each Runner's Pod via gRPC. Please make sure 1. Each Runner's Pod in each Namespace is allowed to open, and serve at port 30000 (the gRPC port of a Runner), and the Controller can connect to it. 2. The Controller needs to download tar.gz BLOBs from the Source controller via port 80. 3. The Controller needs to post the events to the Notification controller via port 80.
https://fluxcd.io/flux/flux-e2e/ is an awesome document for providing very valuable security-wise overview, but tf-controller is not part of this document
This is so valuable thing to look at. Thank you so much for writing this up, @gberche-orange! Would love to take a look at it shortly.
A sequence diagram of tf-controller would be a good addition too.
- First we need to draw the diagram for the connection between the Controller and the Runner, port numbers
- https://github.com/weaveworks/tf-controller/tree/main/controllers: start with tf_controller to see the flow