weave-gitops icon indicating copy to clipboard operation
weave-gitops copied to clipboard

GitOps Run Phase 2

Open JamWils opened this issue 2 years ago • 6 comments

We are still focusing on the platform operator in this phase that needs to configure a cluster and wants to see live feedback as they are working. Once they are done they should be able to turn off gitops run and return to a GitOps paradigm.

Definitions

To help drive conversations we are going to lay out some terminology to help us discuss things moving forward.

Modes

GitOps: this is the default mode we are always aiming for when using Weave GitOps. Whenever gitops run is not active we want users to be in this mode. This means that the cluster is being driven by some mechanism reading from git, ideally flux, and that system is applying those changes to the cluster. Run: this is when the cluster has gitops run running on the cluster. There is a live reload session that is occuring and the cluster is no longer in a pure GitOps or Snowflake mode. Ideally, when gitops run stops running that the cluster enters into the GitOps mode that is defined above. Snowflake: we are referring to a cluster that is driven by some other mechanism outside of GitOps or Run. For example, a platform operator could have ran various kubectl apply commands and installed a couple helm charts using helm. The only way for this cluster to reach this state again is to rerun those commands or to transition to GitOps mode.

Sessions

What happens when you start gitops run and then stop it? There are at least two different options that we could start with and they are defined below.

Version: create a "side version" of the same workload. This means we would keep something running in the dev namespace, but would create a sub namespace so you have two versions running in different namespaces. This would be useful for being able to quickly use services to compare the two. This helps application teams scale and use less clustsers. Replace: this will replace the existing workload on the cluster with what exists in the current working directory. This is much more invasive on the cluster and limits the number of people that can interact with the same workload.

Goals

As a user when I am trying to do work on a cluster I want to be able to migrate from various modes and transition to GitOps mode. This means I should be able to enter run mode on clusters that are in gitops or snowflake mode and either respectively return/transition the state of the cluster into gitops mode or by the end.

  1. I want to be able to swtich between run and gitops mode.
  2. Being able to run gitops run on a cluster that is already configured with Flux and GitOps.
  3. With phase one we guaranteed that gitops run would work with Kind, Minikube, Docker Desktop, and k3d. By the end of this phase we should be able to work with remote clusters such as EKS, GKE, etc.
  4. The path can run against both Kustomization or Helm Releases.
  5. As a user, I want to pull down someone's latest pull request and run gitops run against the helm charts on that branch.
  6. As a user, I want to pull down someone's latest pull request and run gitops run against a kustomization overlay on that branch.
  7. I want to bypass bootstrapping when using gitops run.
  8. Running workloads safely in isolation.
  9. As a user, I want to be able to declaratively configure GitOps. I.e. that is pass in a file.
  10. We can run the application in isolation via a vcluster or just load it directly onto the cluster.
  11. Multiple instances of gitops run can run against the cluster.
  12. As a platform operator, I want to be able to eliminate any lingering gitops run instances that may have been left on the cluster.

JamWils avatar Jul 19 '22 02:07 JamWils

Canarying across two namespaces feels very much like flagger, except that this is point in time comparison with the ability to manually correct. The immediate question at that point (for me) is how am I making that comparison? Can I diff those name spaces with a single command and show the change in results in either UI or STDOUT

‘ Gitops run —diff ‘ for only capturing change running kubectl get "$(kubectl api-resources --verbs=list -o name | tr '\n' ',')<NAMESPACE>” on both namespaces. Add —verbose for comparison against templated|kustomized source in case there is config which isn't getting picked up or kustomized the way you think. Though it's way outta scope, that last bit will have policy checks and surfacing how those either 1) aren't working to prevent change or 2) are working too well and breaking things is a story In Rapid prototyping and policy/security development.

Can add a user story if thats needed elsewhere

fire-ant avatar Jul 27 '22 11:07 fire-ant

In out of scope, it says

We cannot gracefully handle "raw" helm charts.

But the rest of the document mentions helm releases multiple times - for example:

The path can run against both Kustomization or Helm Releases.

What is the helm release here?

ozamosi avatar Jul 27 '22 15:07 ozamosi

We cannot gracefully handle "raw" helm charts.

This was a mistake, I removed this from out of scope.

JamWils avatar Jul 27 '22 18:07 JamWils

Canarying across two namespaces feels very much like flagger, except that this is point in time comparison with the ability to manually correct.

Right, I guess it is a similar model, but it stays instead of transitioning traffic.

Can I diff those name spaces with a single command and show the change in results in either UI or STDOUT

You have logging on GitOps Run, I am spinning up a UX epic separate from this which will tackle some of these very things. I think the diff would be super valuable if you can cleanly see it in the UI. There will be a similar logout as well. Plus if multiple people are running run on the cluster you would easily be able to see those and click into any of them. At least that is where my head is at.

JamWils avatar Jul 27 '22 19:07 JamWils

THIS IS THE OLD VERSION OF THE EPIC. WE HAVE DONE A LOT OF RESEARCH AND I AM MOVING IT HERE FOR HISTORICAL PURPOSES. I AM GOING TO UPDATE THE EPIC NOW BASED ON THE RESEARCH WE COMPLETED.

We are still focusing on the platform operator in this phase that needs to configure a cluster and wants to see live feedback as they are working. Once they are done they should be able to turn off gitops run and return to a GitOps paradigm.

Definitions

To help drive conversations we are going to lay out some terminology to help us discuss things moving forward.

Modes

  1. GitOps: this is the default mode we are always aiming for when using Weave GitOps. Whenever gitops run is not active we want users to be in this mode. This means that the cluster is being driven by some mechanism reading from git, ideally flux, and that system is applying those changes to the cluster.
  2. Run: this is when the cluster has gitops run running on the cluster. There is a live reload session that is occuring and the cluster is no longer in a pure GitOps or Snowflake mode. Ideally, when gitops run stops running that the cluster enters into the GitOps mode that is defined above.
  3. Snowflake: we are referring to a cluster that is driven by some other mechanism outside of GitOps or Run. For example, a platform operator could have ran various kubectl apply commands and installed a couple helm charts using helm. The only way for this cluster to reach this state again is to rerun those commands or to transition to GitOps mode.

Sessions

What happens when you start gitops run and then stop it? There are at least two different options that we could start with and they are defined below.

  1. Version: create a "side version" of the same workload. This means we would keep something running in the dev namespace, but would create a sub namespace so you have two versions running in different namespaces. This would be useful for being able to quickly use services to compare the two. This helps application teams scale and use less clustsers.
  2. Replace: this will replace the existing workload on the cluster with what exists in the current working directory. This is much more invasive on the cluster and limits the number of people that can interact with the same workload.

Goals

As a user when I am trying to do work on a cluster I want to be able to migrate from various modes and transition to GitOps mode. This means I should be able to enter run mode on clusters that are in gitops or snowflake mode and either respectively return/transition the state of the cluster into gitops mode or by the end.

  1. I want to be able to swtich between run and gitops mode.
  2. Being able to run gitops run on a cluster that is already configured with Flux and GitOps.
  3. With phase one we guaranteed that gitops run would work with Kind, Minikube, Docker Desktop, and k3d. By the end of this phase we should be able to work with remote clusters such as EKS, GKE, etc.
  4. The path can run against both Kustomization or Helm Releases.

Out of Scope

While we can offer the capability to not enter GitOps mode after running gitops run, that functionality is out of scope for this epic.

Running "version" sessions is out of scope. We will focus on "replace" sessions only for this epic.

We are not going to worry about flux being installed anywhere else except flux-system.

User Stories

  • As a developer, I want to be able to run a simple command that will allow me to point to a working directory. This working directory should be a git repository. The upstream of this git repository (i.e. GitHub, GitLab, BitBucket) is irrelevant.

  • As a developer, I want to be able to specify the directory which contains my manifests when working with sync. This will enable these manifests to be pulled in locally so you can see the changes get applied. I should see a new cloud native Bucket source appear which contains the manifests for that branch.

  • As a developer, when I turn off sync then the temporary objects should be removed from my cluster and the cluster should be back in the state that Flux had put it in.

  • As a developer, as I make a changes to my manifests in my local repository I should see those changes reflected on the cluster as the files are uploaded to the Bucket on the cluster. For example, if I create a new ConfigMap with various values then I should see that ConfigMap on the cluster without taking any additional action such as kubectl apply or “pushing” my changes to a remote git server for Flux to sync.

  • As a user I expect for non-namespace entities to be handled gracefully. This means that if I am running gitops run against a path that is installing Tekton, Contour, or some other tooling with CRDs that gitops run is able to gracefully transition into run mode.


Here is a legend for executing gitops run

flowchart TD
    id1[GitOps Run path]
    id2[GitOps Run root directory]
    id3[Overridden Flux Object]
    id4[Files created by GitOps Run]
    id5[Synced directories]

    style id1 fill:#5fd2e8,stroke:#000,color:#000
    style id2 fill:#ebd08b,stroke:#000,color:#000
    style id3 fill:#d12f82,stroke:#000,color:#000
    style id4 fill:#6eed9e,stroke:#000,color:#000

GitOps Run used for the first time on a new cluster

  • As a developer, I should be able to see changes appear live in the GitOps dashboard under the requisite Kustomization or HelmRelease.

Criteria

  1. This is a brand new cluster
  2. Flux is not already installed on the cluster.
  3. The workload does not exist on the cluster.
  4. All manifests are in the same repository.
  5. gitops run is run on the basic path
%%{init: { logLevel:0, startOnLoad: false, themeCSS:'.label { font-family: Source Sans Pro,Helvetica Neue,Arial,sans-serif; }' }}%%
flowchart LR
    run[gitops run] --> cluster
    root <--> run
    subgraph root[root path: ./ ]
    subgraph sgp6 [ ]
    subgraph sgp7 [ ]
    subgraph clusters[./clusters]
    subgraph sgp1 [ ]
      subgraph my-cluster[./my-cluster]
        subgraph sgp2 [ ]
        dboard[gitops-dashboard.yaml]
        subgraph flux[./flux-system]
          a[gotk-components.yaml] 
          b[gotk-sync.yaml]
          c[kustomization.yaml]
        end
        end
      end
    end  
    end
    end
    end
    
    end
    subgraph cluster
      direction TB
      subgraph sgp5 [ ]
        c-a[We install Flux CRDs and Controllers]
        c-b[Temp Bucket and Kustomization]
        c-c[Dev Bucket Server]
      end
    end
      
    
classDef subgraph_padding fill:#000,stroke:none
class sgp1,sgp2,sgp3,sgp4,sgp5 subgraph_padding

classDef basic fill:#000;
class clusters,my-cluster,flux,app,cluster basic;

style root fill:#ebd08b,stroke:#000,color:#000
style sgp6 fill:#ebd08b,stroke:transparent,color:#000
style sgp7 fill:#5fd2e8,stroke:transparent,color:#000
classDef create fill:#6eed9e,stroke:#000,color:#000;
class a,b,c,dboard create

Criteria

  • As a developer, I expect to see a relevant Kustomization or HelmRelease that will sync with the cloud native Bucket source that was created above.
  • As a developer, I might not have access to certain values such as database paths or keys which are located on the relevant dev HelmRelease. These values should be copied over to the new temporary HelmRelease that is relevant to the branch.
  • As a developer, when I switch to a new branch on my repository I should see my changes clear out of my temporary environment. The Kustomization or HelmRelease should reflect the changes that are in my local working directory.

Scenario

  1. Flux is already installed on the cluster.
  2. The workload already exists on the cluster.
  3. The user is trying to update an existing workload and is running it in a specific directory.
  4. All manifests are in the same repository.
%%{init: { logLevel:0, startOnLoad: false, themeCSS:'.label { font-family: Source Sans Pro,Helvetica Neue,Arial,sans-serif; }' }}%%
flowchart LR
    run[gitops run] --> cluster
    root <--> run
    subgraph root[root path ./ ]
    subgraph sgp6 [ ]
    subgraph clusters[./clusters]
    subgraph sgp1 [ ]
      subgraph my-cluster[./my-cluster]
        subgraph sgp2 [ ]
        dboard[gitops-dashboard.yaml]
        subgraph app[./app]
          d[dev-ks.yaml]
        end
        subgraph flux[./flux-system]
          direction TB
          a[gotk-components.yaml] 
          b[gotk-sync.yaml]
          c[kustomization.yaml]
        end
        end
      end
    end  
    end
      
    subgraph app-manifests[./app]
    subgraph sgp3 [ ]
      31[app.yaml]
      end
    end
    subgraph dev[./dev]
    subgraph sgp4 [ ]
      21[nginx.yaml]
      22[ns.yaml]
    end
    end
    end
    end

    subgraph cluster
      direction TB
      subgraph sgp5 [ ]
        c-a[Flux already exists,\n no need to install]
        d --> c-d[Paused kustomization]
        c-b[Temp Bucket and Kustomization]
        c-c[Dev Bucket Server]
      end
    end
    
    
classDef subgraph_padding fill:none,stroke:none
class sgp1,sgp2,sgp3,sgp4,sgp5 subgraph_padding

classDef basic fill:#000;
class app-manifests,clusters,my-cluster,flux,app,cluster basic;

style dev fill:#5fd2e8,stroke:#000,color:#000;
style root fill:#ebd08b,stroke:#000,color:#000;
style sgp6 fill:#ebd08b,stroke:transparent,color:#000;
style d fill:#d12f82,stroke:#000,color:#000;
style c-d fill:#d12f82,stroke:#000,color:#000;

JamWils avatar Sep 28 '22 23:09 JamWils

I am putting a note here so I do not forget about what does it mean to run port forwarding on leaf clusters in the enterprise version.

JamWils avatar Sep 29 '22 00:09 JamWils

I am closing this epic. The team has made numerous bug fixes and quality of life improvements to gitops run.

JamWils avatar Feb 15 '23 14:02 JamWils