feat: prevent retries when terminal error happens
What does this PR do?
This PR:
- Improves log parsing logic a bit
- Adds a way to prevent retries when terminal errors happen such as "wrong dockerfile path".
- Moves the files around to make them a bit more pleasant to look
- Adds terminal error list
- Adds retry for our own db queries.
Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Chore (refactoring code, technical debt, workflow improvements)
- [x] Enhancement (small improvements)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
How should this be tested?
- Run your docker setup
docker compose -f ../deployment/docker-compose.yaml up mysql planetscale clickhouse redis s3 gw ctrl krane clickhouse -d --build
- Make a project and grab your projectID
- Make a local deployment
- Verify that you have logs in the CH
- You can also compare against depot.dev build logs to verify the integrity
- Now make a faulty deployment easiest way to do is update your dockerfile, just do this:
From:
COPY --from=builder /app/main .
To:
COPY --from=builder /app/something_stupid_power_move_slay .
This is enough to break depot flow.
- Now, make sure Restate doesn't retry and you don't end up with duplicated logs
โ ๏ธ No Changeset found
Latest commit: ec59a1728ccfae56a7c3d2b7af37812e8531ac4a
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
This PR includes no changesets
When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
The latest updates on your projects. Learn more about Vercel for GitHub.
| Project | Deployment | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| dashboard | Preview | Comment | Nov 18, 2025 11:36am | |
| engineering | Preview | Comment | Nov 18, 2025 11:36am |
- #4258
๐ (View in Graphite) - #4257

- #4256

- #4255

main
This stack of pull requests is managed by Graphite. Learn more about stacking.
๐ Walkthrough
Walkthrough
Introduce centralized terminal-vs-retriable build error classification and wrapping, add retry-wrapped DB ops, and refactor BuildKit status handling to buffer, sort, and emit chronological events via a larger status channel; update deploy workflow to treat terminal build errors as permanent failures.
Changes
| Cohort / File(s) | Summary |
|---|---|
Error handling go/apps/ctrl/services/build/backend/depot/errors.go |
New file: adds terminal error pattern matching (isTerminalBuildError) and a wrapBuildError helper that produces a Restate terminal error for matched messages or a Connect error otherwise. |
Build creation & status processing go/apps/ctrl/services/build/backend/depot/create_build.go |
Replaced inline error construction with wrapBuildError; introduced buildStatusChannelBuffer = 1000; renamed/refactored processBuildStatus โ processBuildStatusLogs, changed to buffer status events and deterministically emit ordered vertex completions and logs (sorting by timestamp), removed per-status immediate log emission. |
DB retries in depot project flow go/apps/ctrl/services/build/backend/depot/create_build.go (getOrCreateDepotProject) |
Wrapped project lookup and depot_project_id update calls in WithRetryContext to retry transient DB errors; update path now executes inside retry context (preserves prior retry semantics). |
Deploy workflow handling go/apps/ctrl/workflows/deploy/deploy_handler.go |
Detect terminal build errors via restate.IsTerminalError(err) during Docker build step; treat terminal errors as permanent failures (log and return non-retriable error), while preserving retry behavior for non-terminal errors. |
Sequence Diagram(s)
sequenceDiagram
participant Client
participant BuildSvc as Build Service
participant StatusBuf as Status Buffer
participant DB
participant ErrorWrap as wrapBuildError
participant Restate
Client->>BuildSvc: Request CreateBuild
BuildSvc->>DB: WithRetryContext(Get/Update depot project)
DB-->>BuildSvc: OK / Err
BuildSvc->>BuildSvc: Start Solve / connect BuildKit
BuildSvc->>StatusBuf: Send status events (buffered, channel size = 1000)
StatusBuf->>StatusBuf: Accumulate events
StatusBuf->>StatusBuf: Sort events by timestamp
StatusBuf->>BuildSvc: Emit ordered vertex completions & logs
alt Any failure encountered
BuildSvc->>ErrorWrap: wrapBuildError(err, code, msg)
alt ErrorWrap returns TerminalError
ErrorWrap->>Restate: TerminalError (no retry)
Restate->>Client: Permanent failure
else Non-terminal
ErrorWrap->>Client: Connect error (retryable)
Client->>Restate: Retry via restate
end
else Build completes successfully
BuildSvc->>Client: Success
end
Estimated code review effort
๐ฏ 4 (Complex) | โฑ๏ธ ~45 minutes
- Pay special attention to:
create_build.go: correctness of buffered status accumulation, sorting order, memory/latency implications, and channel sizing usage.errors.go: terminal pattern coverage and false-positive/negative risks in pattern matching.getOrCreateDepotProjectretry semantics: idempotency of updates and correct error propagation.deploy_handler.go: integration of terminal detection with existing retry and logging behavior.
Pre-merge checks and finishing touches
โ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | โ ๏ธ Warning | Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
โ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | โ Passed | The title accurately captures the main feature: preventing retries when terminal errors occur, which aligns with the primary changes in the changeset. |
| Description check | โ Passed | The description provides comprehensive context including what the PR does, type of change, and detailed testing instructions; all required sections are completed with meaningful content. |
โจ Finishing touches
- [ ] ๐ Generate docstrings
๐งช Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
deploy-logs/error-handling
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Thank you for following the naming conventions for pull request titles! ๐
@Flo4604 I tried to add our uid instead of sha and it made the code a bit more crowded than it should be. Because, we need to track what sha actually point to which uid, I think its just not worth it. If that sha ever becomes problematic, I'm to going back and replacing it with uid. I say for now let's stick to what we have, wdyt?
@Flo4604 I tried to add our
uidinstead ofshaand it made the code a bit more crowded than it should be. Because, we need to track whatshaactually point to whichuid, I think its just not worth it. If thatshaever becomes problematic, I'm to going back and replacing it withuid. I say for now let's stick to what we have, wdyt?
fine with me