tilt icon indicating copy to clipboard operation
tilt copied to clipboard

[Feature] Allow to perssit the Tilt `EngineState` object outside `tilt` binary

Open rrmistry opened this issue 9 months ago • 1 comments

Describe the Feature You Want

Previous feature request (before further investigation) Allow a new command line flag to load an existing Tilt CI state when running `tilt ci`.

E.g.

tilt ci \
    --use-snapshot <path_to_snapshot_file> \                    👈
    --output-snapshot-on-exit string <path_to_snapshot_file> \  👈
    --context ... \
    --file ...

[EDIT]:

Allow a new command line flag to read/write to an existing Tilt EngineState when running tilt ci or tilt up.

E.g.

tilt up --engine-state-file tilt-engine-state.json

If the file already exists when tilt starts:

  • then tilt will first load the EngineState into memory from the specified file before processing actions for resources.

On tilt exit:

  • the file gets updated with the final EngineState object as it existed in tilt memory.

Current Behavior

Running tilt ci in the CI/CD pipeline will re-deploy all resources all the time


Why Do You Want This?

We have a microservices architecture with a mono-repo setup. When deploying changes, we often deploy only one or two services at a time.

As we adopt Tilt into more of our workflows, we are noticing that in a CI/CD pipeline, having Tilt orchestrate at the top level (Mono-repo) causes all the microservices (sub-repos) to get redeployed even if no code has changed in most services.

This causes longer run times for CI and a lot of unnecessary churn on the cluster side. This also causes many unnecessary image tags for our microservices with no real difference between versions for most tags.

It would be good if the Tilt state could persist the EngineState between CI/CD pipeline runs so that only the delta changes would get deployed. Not all our services.


Additional context

We have 100+ resources within Tilt that get deployed to the cluster and a typical run takes ~1 hour when it could be reduced to ~5 minutes if Tilt could persist the internal state between runs. There would be real-world dollar savings if this feature could be utilized $$$

Ideally, we are looking for behavior like Terraform for external state management. But saving to a local file can be a good start instead of supporting cloud backends. For CI/CD systems, we can persist the state in external storage between CI runs, so local file storage is probably enough for our use case. Making state edits for parallel runs becomes the next problem, but so far, Terraform solves this by just locking/unlocking the state file (erroring out when locking fails).

rrmistry avatar Nov 22 '23 03:11 rrmistry

Upon looking further, I found that the existing command line parameter --output-snapshot-on-exit will just save the webview representation of the Tilt UI. It does not contain the internal state of Tile (e.g. triggers, file hashes, dependencies, resource labels, etc.)

So the ask here needs to be expanded to include both the import/export of the Tilt internal state for the EngineState object.

If this can be achieved, Tilt becomes an incremental build system as well 🤩. Allowing for highly efficient deployment of changes for large-scale projects with many steps and long deployment run times.

rrmistry avatar Jan 03 '24 03:01 rrmistry

Hmm....I think there's a bunch of confusion and misunderstanding in this issue.

Tilt and Kubernetes are both reconcilers. If nothing has changed, then no pods will get redeployed.

This doc explains this in more detail: https://docs.tilt.dev/controlloop

This test demonstrates that if you run tilt up twice, the expected behavior is to keep the existing pod running: https://github.com/tilt-dev/tilt/blob/3359b56077d1bff23de1d179cb99431eefdd3c35/integration/idempotent_test.go#L16

If there's a bug where running tilt up twice causes an unexpected pod restart, that should be filed as a separate issue with repro steps, etc.

We're probably not going to implement an external state management system like Terraform, since Kubernetes already does this on its own control plane.

nicks avatar Mar 23 '24 20:03 nicks

Thanks @nicks !

This helped point us in the right direction. On further checking, we realized that the churn in our Kubernetes resources was coming from Helm, specifically the helm_resource extension:

https://github.com/tilt-dev/tilt-extensions/blob/fddc7ae3792be0ca304b283c2522594167e87c47/helm_resource/Tiltfile#L8

This was causing any change in the helm chart to restart all pods in the chart, including dependencies on other charts. At least once per tilt up session.

Now we have switched to using k8s_yaml method directly, which doesn't seem to cause (as much) churn. So thank you for pointing us!


For anyone else finding this, our workaround was:

# Reference: https://docs.tilt.dev/api#api.helm
helm_output = helm(...)

# Reference: https://docs.tilt.dev/api.html#api.k8s_yaml
k8s_yaml(helm_output)

The downside of using k8s_yaml is that we can't group resources within Tilt nicely like described in #6106.

In light of this, our workaround now is to define:

for w in [..., "my-chart-prometheus", "my-chart-grafana", ...]:
    k8s_resource(workload=w, labels=["infra"]) # <<< Just to define Tilt resource labels

for w in [..., "my-chart-vitess", "my-chart-neo4j", ...]:
    k8s_resource(workload=w, labels=["databases"]) # <<< Just to define Tilt resource labels

for w in [..., "my-chart-api-service1", "my-chart-api-service2", ...]:
    k8s_resource(workload=w, labels=["apis"]) # <<< Just to define Tilt resource labels
... 

Ideally, all the calls to k8s_resource(workload=w, labels=["something"]) would be good to define explicitly within Kubernetes manifests using labels (e.g. tilt.dev/resource: infra) but this workaround above is good enough for now.

rrmistry avatar Mar 24 '24 00:03 rrmistry