Create a design-document for the controller
Motivation
I started some "R'n'D" (scare quotes intended) for implementing scale up, scale down, self-healing and so on and quickly realized, that the coding of the member add/member remove and similar steps is the more trivial part of the undertaking. The difficult part is coming up with a working algorithm that can correctly deduce the cluster's state and execute the necessary actions at the right time.
To better reason about the controller's algorithm now, and to better develop it going forward, I feel it is important to have good documentation of the current design and the intended next steps, so I started with trying to document the current state of the code.
Results
This document contains a mermaid flowchart that outlines the reconciliation loop. It is better viewed in rendered form.
Going forward, I envision this document to have at least three purposes:
- Let the developers spot flaws and prompt them to open issues.
- Act as a more detailed form of documentation for advanced users.
- Be a blueprint for implementing anything non-trivial.
Could you please move to design subdirectory into website
I am a bit confused with this section of flow:
I would suggest a bit updated way to control resources in this case:
Also, I didn't get the purpose of the steps:
- "Promote any learners."
- "Ensure StatefulSet with replicas = max member ordinal + 1"
I mean, could you please list cases that we avoid using these checks?
Let's move it into architecture in docs, and we can merge it after v0.3