unkey icon indicating copy to clipboard operation
unkey copied to clipboard

feat: prevent retries when terminal error happens

Open ogzhanolguncu opened this issue 1 month ago โ€ข 7 comments

What does this PR do?

This PR:

  • Improves log parsing logic a bit
  • Adds a way to prevent retries when terminal errors happen such as "wrong dockerfile path".
  • Moves the files around to make them a bit more pleasant to look
  • Adds terminal error list
  • Adds retry for our own db queries.

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Chore (refactoring code, technical debt, workflow improvements)
  • [x] Enhancement (small improvements)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update

How should this be tested?

  • Run your docker setup
docker compose -f ../deployment/docker-compose.yaml up mysql planetscale clickhouse redis s3 gw ctrl krane clickhouse -d --build
  • Make a project and grab your projectID
  • Make a local deployment
  • Verify that you have logs in the CH
  • You can also compare against depot.dev build logs to verify the integrity
  • Now make a faulty deployment easiest way to do is update your dockerfile, just do this:
From:
COPY --from=builder /app/main .
To:
COPY --from=builder /app/something_stupid_power_move_slay .

This is enough to break depot flow.

  • Now, make sure Restate doesn't retry and you don't end up with duplicated logs

ogzhanolguncu avatar Nov 06 '25 13:11 ogzhanolguncu

โš ๏ธ No Changeset found

Latest commit: ec59a1728ccfae56a7c3d2b7af37812e8531ac4a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

changeset-bot[bot] avatar Nov 06 '25 13:11 changeset-bot[bot]

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
dashboard Ready Ready Preview Comment Nov 18, 2025 11:36am
engineering Ready Ready Preview Comment Nov 18, 2025 11:36am

vercel[bot] avatar Nov 06 '25 13:11 vercel[bot]

This stack of pull requests is managed by Graphite. Learn more about stacking.

ogzhanolguncu avatar Nov 06 '25 13:11 ogzhanolguncu

๐Ÿ“ Walkthrough

Walkthrough

Introduce centralized terminal-vs-retriable build error classification and wrapping, add retry-wrapped DB ops, and refactor BuildKit status handling to buffer, sort, and emit chronological events via a larger status channel; update deploy workflow to treat terminal build errors as permanent failures.

Changes

Cohort / File(s) Summary
Error handling
go/apps/ctrl/services/build/backend/depot/errors.go
New file: adds terminal error pattern matching (isTerminalBuildError) and a wrapBuildError helper that produces a Restate terminal error for matched messages or a Connect error otherwise.
Build creation & status processing
go/apps/ctrl/services/build/backend/depot/create_build.go
Replaced inline error construction with wrapBuildError; introduced buildStatusChannelBuffer = 1000; renamed/refactored processBuildStatus โ†’ processBuildStatusLogs, changed to buffer status events and deterministically emit ordered vertex completions and logs (sorting by timestamp), removed per-status immediate log emission.
DB retries in depot project flow
go/apps/ctrl/services/build/backend/depot/create_build.go (getOrCreateDepotProject)
Wrapped project lookup and depot_project_id update calls in WithRetryContext to retry transient DB errors; update path now executes inside retry context (preserves prior retry semantics).
Deploy workflow handling
go/apps/ctrl/workflows/deploy/deploy_handler.go
Detect terminal build errors via restate.IsTerminalError(err) during Docker build step; treat terminal errors as permanent failures (log and return non-retriable error), while preserving retry behavior for non-terminal errors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BuildSvc as Build Service
    participant StatusBuf as Status Buffer
    participant DB
    participant ErrorWrap as wrapBuildError
    participant Restate

    Client->>BuildSvc: Request CreateBuild
    BuildSvc->>DB: WithRetryContext(Get/Update depot project)
    DB-->>BuildSvc: OK / Err

    BuildSvc->>BuildSvc: Start Solve / connect BuildKit
    BuildSvc->>StatusBuf: Send status events (buffered, channel size = 1000)
    StatusBuf->>StatusBuf: Accumulate events
    StatusBuf->>StatusBuf: Sort events by timestamp
    StatusBuf->>BuildSvc: Emit ordered vertex completions & logs

    alt Any failure encountered
        BuildSvc->>ErrorWrap: wrapBuildError(err, code, msg)
        alt ErrorWrap returns TerminalError
            ErrorWrap->>Restate: TerminalError (no retry)
            Restate->>Client: Permanent failure
        else Non-terminal
            ErrorWrap->>Client: Connect error (retryable)
            Client->>Restate: Retry via restate
        end
    else Build completes successfully
        BuildSvc->>Client: Success
    end

Estimated code review effort

๐ŸŽฏ 4 (Complex) | โฑ๏ธ ~45 minutes

  • Pay special attention to:
    • create_build.go: correctness of buffered status accumulation, sorting order, memory/latency implications, and channel sizing usage.
    • errors.go: terminal pattern coverage and false-positive/negative risks in pattern matching.
    • getOrCreateDepotProject retry semantics: idempotency of updates and correct error propagation.
    • deploy_handler.go: integration of terminal detection with existing retry and logging behavior.

Pre-merge checks and finishing touches

โŒ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage โš ๏ธ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
โœ… Passed checks (2 passed)
Check name Status Explanation
Title check โœ… Passed The title accurately captures the main feature: preventing retries when terminal errors occur, which aligns with the primary changes in the changeset.
Description check โœ… Passed The description provides comprehensive context including what the PR does, type of change, and detailed testing instructions; all required sections are completed with meaningful content.
โœจ Finishing touches
  • [ ] ๐Ÿ“ Generate docstrings
๐Ÿงช Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch deploy-logs/error-handling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

โค๏ธ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 06 '25 13:11 coderabbitai[bot]

Thank you for following the naming conventions for pull request titles! ๐Ÿ™

github-actions[bot] avatar Nov 06 '25 14:11 github-actions[bot]

@Flo4604 I tried to add our uid instead of sha and it made the code a bit more crowded than it should be. Because, we need to track what sha actually point to which uid, I think its just not worth it. If that sha ever becomes problematic, I'm to going back and replacing it with uid. I say for now let's stick to what we have, wdyt?

ogzhanolguncu avatar Nov 13 '25 09:11 ogzhanolguncu

@Flo4604 I tried to add our uid instead of sha and it made the code a bit more crowded than it should be. Because, we need to track what sha actually point to which uid, I think its just not worth it. If that sha ever becomes problematic, I'm to going back and replacing it with uid. I say for now let's stick to what we have, wdyt?

fine with me

Flo4604 avatar Nov 13 '25 09:11 Flo4604