weave-gitops
weave-gitops copied to clipboard
GitOps Run Phase 2
We are still focusing on the platform operator in this phase that needs to configure a cluster and wants to see live feedback as they are working. Once they are done they should be able to turn off gitops run and return to a GitOps paradigm.
Definitions
To help drive conversations we are going to lay out some terminology to help us discuss things moving forward.
Modes
GitOps: this is the default mode we are always aiming for when using Weave GitOps. Whenever gitops run is not active we want users to be in this mode. This means that the cluster is being driven by some mechanism reading from git, ideally flux, and that system is applying those changes to the cluster. Run: this is when the cluster has gitops run running on the cluster. There is a live reload session that is occuring and the cluster is no longer in a pure GitOps or Snowflake mode. Ideally, when gitops run stops running that the cluster enters into the GitOps mode that is defined above. Snowflake: we are referring to a cluster that is driven by some other mechanism outside of GitOps or Run. For example, a platform operator could have ran various kubectl apply commands and installed a couple helm charts using helm. The only way for this cluster to reach this state again is to rerun those commands or to transition to GitOps mode.
Sessions
What happens when you start gitops run and then stop it? There are at least two different options that we could start with and they are defined below.
Version: create a "side version" of the same workload. This means we would keep something running in the dev namespace, but would create a sub namespace so you have two versions running in different namespaces. This would be useful for being able to quickly use services to compare the two. This helps application teams scale and use less clustsers. Replace: this will replace the existing workload on the cluster with what exists in the current working directory. This is much more invasive on the cluster and limits the number of people that can interact with the same workload.
Goals
As a user when I am trying to do work on a cluster I want to be able to migrate from various modes and transition to GitOps mode. This means I should be able to enter run mode on clusters that are in gitops or snowflake mode and either respectively return/transition the state of the cluster into gitops mode or by the end.
- I want to be able to swtich between run and gitops mode.
- Being able to run gitops run on a cluster that is already configured with Flux and GitOps.
- With phase one we guaranteed that gitops run would work with Kind, Minikube, Docker Desktop, and k3d. By the end of this phase we should be able to work with remote clusters such as EKS, GKE, etc.
- The path can run against both Kustomization or Helm Releases.
- As a user, I want to pull down someone's latest pull request and run
gitops run
against the helm charts on that branch. - As a user, I want to pull down someone's latest pull request and run
gitops run
against a kustomization overlay on that branch. - I want to bypass bootstrapping when using
gitops run
. - Running workloads safely in isolation.
- As a user, I want to be able to declaratively configure GitOps. I.e. that is pass in a file.
- We can run the application in isolation via a vcluster or just load it directly onto the cluster.
- Multiple instances of gitops run can run against the cluster.
- As a platform operator, I want to be able to eliminate any lingering
gitops run
instances that may have been left on the cluster.
Canarying across two namespaces feels very much like flagger, except that this is point in time comparison with the ability to manually correct. The immediate question at that point (for me) is how am I making that comparison? Can I diff those name spaces with a single command and show the change in results in either UI or STDOUT
‘ Gitops run —diff ‘ for only capturing change running kubectl get "$(kubectl api-resources --verbs=list -o name | tr '\n' ',')<NAMESPACE>” on both namespaces. Add —verbose for comparison against templated|kustomized source in case there is config which isn't getting picked up or kustomized the way you think. Though it's way outta scope, that last bit will have policy checks and surfacing how those either 1) aren't working to prevent change or 2) are working too well and breaking things is a story In Rapid prototyping and policy/security development.
Can add a user story if thats needed elsewhere
In out of scope, it says
We cannot gracefully handle "raw" helm charts.
But the rest of the document mentions helm releases multiple times - for example:
The path can run against both Kustomization or Helm Releases.
What is the helm release here?
We cannot gracefully handle "raw" helm charts.
This was a mistake, I removed this from out of scope.
Canarying across two namespaces feels very much like flagger, except that this is point in time comparison with the ability to manually correct.
Right, I guess it is a similar model, but it stays instead of transitioning traffic.
Can I diff those name spaces with a single command and show the change in results in either UI or STDOUT
You have logging on GitOps Run, I am spinning up a UX epic separate from this which will tackle some of these very things. I think the diff would be super valuable if you can cleanly see it in the UI. There will be a similar logout as well. Plus if multiple people are running run on the cluster you would easily be able to see those and click into any of them. At least that is where my head is at.
THIS IS THE OLD VERSION OF THE EPIC. WE HAVE DONE A LOT OF RESEARCH AND I AM MOVING IT HERE FOR HISTORICAL PURPOSES. I AM GOING TO UPDATE THE EPIC NOW BASED ON THE RESEARCH WE COMPLETED.
We are still focusing on the platform operator in this phase that needs to configure a cluster and wants to see live feedback as they are working. Once they are done they should be able to turn off gitops run
and return to a GitOps paradigm.
Definitions
To help drive conversations we are going to lay out some terminology to help us discuss things moving forward.
Modes
-
GitOps: this is the default mode we are always aiming for when using Weave GitOps. Whenever
gitops run
is not active we want users to be in this mode. This means that the cluster is being driven by some mechanism reading from git, ideally flux, and that system is applying those changes to the cluster. -
Run: this is when the cluster has
gitops run
running on the cluster. There is a live reload session that is occuring and the cluster is no longer in a pure GitOps or Snowflake mode. Ideally, whengitops run
stops running that the cluster enters into the GitOps mode that is defined above. -
Snowflake: we are referring to a cluster that is driven by some other mechanism outside of
GitOps
orRun
. For example, a platform operator could have ran variouskubectl apply
commands and installed a couple helm charts usinghelm
. The only way for this cluster to reach this state again is to rerun those commands or to transition to GitOps mode.
Sessions
What happens when you start gitops run
and then stop it? There are at least two different options that we could start with and they are defined below.
-
Version: create a "side version" of the same workload. This means we would keep something running in the
dev
namespace, but would create a sub namespace so you have two versions running in different namespaces. This would be useful for being able to quickly use services to compare the two. This helps application teams scale and use less clustsers. - Replace: this will replace the existing workload on the cluster with what exists in the current working directory. This is much more invasive on the cluster and limits the number of people that can interact with the same workload.
Goals
As a user when I am trying to do work on a cluster I want to be able to migrate from various modes and transition to GitOps
mode. This means I should be able to enter run
mode on clusters that are in gitops
or snowflake
mode and either respectively return/transition the state of the cluster into gitops
mode or by the end.
- I want to be able to swtich between
run
andgitops
mode. - Being able to run
gitops run
on a cluster that is already configured with Flux and GitOps. - With phase one we guaranteed that
gitops run
would work with Kind, Minikube, Docker Desktop, and k3d. By the end of this phase we should be able to work with remote clusters such as EKS, GKE, etc. - The path can run against both
Kustomization
orHelm Releases
.
Out of Scope
While we can offer the capability to not enter GitOps mode after running gitops run
, that functionality is out of scope for this epic.
Running "version" sessions is out of scope. We will focus on "replace" sessions only for this epic.
We are not going to worry about flux being installed anywhere else except flux-system
.
User Stories
-
As a developer, I want to be able to run a simple command that will allow me to point to a working directory. This working directory should be a git repository. The upstream of this git repository (i.e. GitHub, GitLab, BitBucket) is irrelevant.
-
As a developer, I want to be able to specify the directory which contains my manifests when working with sync. This will enable these manifests to be pulled in locally so you can see the changes get applied. I should see a new cloud native
Bucket
source appear which contains the manifests for that branch. -
As a developer, when I turn off sync then the temporary objects should be removed from my cluster and the cluster should be back in the state that Flux had put it in.
-
As a developer, as I make a changes to my manifests in my local repository I should see those changes reflected on the cluster as the files are uploaded to the
Bucket
on the cluster. For example, if I create a new ConfigMap with various values then I should see that ConfigMap on the cluster without taking any additional action such askubectl apply
or “pushing” my changes to a remote git server for Flux to sync. -
As a user I expect for non-namespace entities to be handled gracefully. This means that if I am running
gitops run
against a path that is installingTekton
,Contour
, or some other tooling with CRDs thatgitops run
is able to gracefully transition intorun
mode.
Here is a legend for executing gitops run
flowchart TD
id1[GitOps Run path]
id2[GitOps Run root directory]
id3[Overridden Flux Object]
id4[Files created by GitOps Run]
id5[Synced directories]
style id1 fill:#5fd2e8,stroke:#000,color:#000
style id2 fill:#ebd08b,stroke:#000,color:#000
style id3 fill:#d12f82,stroke:#000,color:#000
style id4 fill:#6eed9e,stroke:#000,color:#000
GitOps Run used for the first time on a new cluster
- As a developer, I should be able to see changes appear live in the GitOps dashboard under the requisite
Kustomization
orHelmRelease
.
Criteria
- This is a brand new cluster
- Flux is not already installed on the cluster.
- The workload does not exist on the cluster.
- All manifests are in the same repository.
-
gitops run
is run on the basic path
%%{init: { logLevel:0, startOnLoad: false, themeCSS:'.label { font-family: Source Sans Pro,Helvetica Neue,Arial,sans-serif; }' }}%%
flowchart LR
run[gitops run] --> cluster
root <--> run
subgraph root[root path: ./ ]
subgraph sgp6 [ ]
subgraph sgp7 [ ]
subgraph clusters[./clusters]
subgraph sgp1 [ ]
subgraph my-cluster[./my-cluster]
subgraph sgp2 [ ]
dboard[gitops-dashboard.yaml]
subgraph flux[./flux-system]
a[gotk-components.yaml]
b[gotk-sync.yaml]
c[kustomization.yaml]
end
end
end
end
end
end
end
end
subgraph cluster
direction TB
subgraph sgp5 [ ]
c-a[We install Flux CRDs and Controllers]
c-b[Temp Bucket and Kustomization]
c-c[Dev Bucket Server]
end
end
classDef subgraph_padding fill:#000,stroke:none
class sgp1,sgp2,sgp3,sgp4,sgp5 subgraph_padding
classDef basic fill:#000;
class clusters,my-cluster,flux,app,cluster basic;
style root fill:#ebd08b,stroke:#000,color:#000
style sgp6 fill:#ebd08b,stroke:transparent,color:#000
style sgp7 fill:#5fd2e8,stroke:transparent,color:#000
classDef create fill:#6eed9e,stroke:#000,color:#000;
class a,b,c,dboard create
Criteria
- As a developer, I expect to see a relevant
Kustomization
orHelmRelease
that will sync with the cloud nativeBucket
source that was created above. - As a developer, I might not have access to certain values such as database paths or keys which are located on the relevant dev
HelmRelease
. These values should be copied over to the new temporaryHelmRelease
that is relevant to the branch. - As a developer, when I switch to a new branch on my repository I should see my changes clear out of my temporary environment. The
Kustomization
orHelmRelease
should reflect the changes that are in my local working directory.
Scenario
- Flux is already installed on the cluster.
- The workload already exists on the cluster.
- The user is trying to update an existing workload and is running it in a specific directory.
- All manifests are in the same repository.
%%{init: { logLevel:0, startOnLoad: false, themeCSS:'.label { font-family: Source Sans Pro,Helvetica Neue,Arial,sans-serif; }' }}%%
flowchart LR
run[gitops run] --> cluster
root <--> run
subgraph root[root path ./ ]
subgraph sgp6 [ ]
subgraph clusters[./clusters]
subgraph sgp1 [ ]
subgraph my-cluster[./my-cluster]
subgraph sgp2 [ ]
dboard[gitops-dashboard.yaml]
subgraph app[./app]
d[dev-ks.yaml]
end
subgraph flux[./flux-system]
direction TB
a[gotk-components.yaml]
b[gotk-sync.yaml]
c[kustomization.yaml]
end
end
end
end
end
subgraph app-manifests[./app]
subgraph sgp3 [ ]
31[app.yaml]
end
end
subgraph dev[./dev]
subgraph sgp4 [ ]
21[nginx.yaml]
22[ns.yaml]
end
end
end
end
subgraph cluster
direction TB
subgraph sgp5 [ ]
c-a[Flux already exists,\n no need to install]
d --> c-d[Paused kustomization]
c-b[Temp Bucket and Kustomization]
c-c[Dev Bucket Server]
end
end
classDef subgraph_padding fill:none,stroke:none
class sgp1,sgp2,sgp3,sgp4,sgp5 subgraph_padding
classDef basic fill:#000;
class app-manifests,clusters,my-cluster,flux,app,cluster basic;
style dev fill:#5fd2e8,stroke:#000,color:#000;
style root fill:#ebd08b,stroke:#000,color:#000;
style sgp6 fill:#ebd08b,stroke:transparent,color:#000;
style d fill:#d12f82,stroke:#000,color:#000;
style c-d fill:#d12f82,stroke:#000,color:#000;
I am putting a note here so I do not forget about what does it mean to run port forwarding on leaf clusters in the enterprise version.
I am closing this epic. The team has made numerous bug fixes and quality of life improvements to gitops run.