tilt icon indicating copy to clipboard operation
tilt copied to clipboard

tilt ci --output-snapshot-on-exit can race and capture the snapshot before all state is reconciled

Open dnephin opened this issue 7 months ago • 1 comments

Expected Behavior

I'm expecting that the snapshot saved by --output-snapshot-on-exit contains all the relevant logs and status about the failure.

Current Behavior

In some cases, let's say roughly 1 in 20 failures, we've noticed that the command can fail, but the snapshot doesn't reflect that failure. Two concrete examples:

Example 1 - The tilt ci run fails with error Error: Custom build "custom-build-cmd" failed: exit status 1. When I open the snapshot I don't see any tiles on the left marked as failed. If I look through every one and I eventually find the build failure. It's status is:

    "runtimeStatus": "pending",
    "updateStatus": "in_progress",

The logs do show the failure.

Example 2 - The tilt ci run fails with Error: exceeded grace period: Pod "some-test-gpv77" failed. This time the runtimeStatus is correctly "error", but the logs are incomplete. It's not that they are truncated due to the buffer. The final logs that contain the error message are what is missing (not earlier logs).

Steps to Reproduce

Other than running a very large number of tilt ci runs on a CI worker I'm not sure how to reliability reproduce this. I assume it's a race condition where the shutdown happens too early before reconciling all the necessary events.

Context

Observed on v0.33.21, not sure when it started. I'll be upgrading to the latest version now, but I assume it hasn't changed since.

About Your Use Case

We use tilt ci in CI to run an environment for end-to-end testing.

dnephin avatar May 15 '25 18:05 dnephin

I'd be happy to submit a patch for this if you can point me in the right direction (specific files or packages to look at). I'm also happy to run a pre-release build to see if we can reproduce the issue with a patch applied.

dnephin avatar May 15 '25 18:05 dnephin