okteto icon indicating copy to clipboard operation
okteto copied to clipboard

feat: retry if buildkit not available

Open jLopezbarb opened this issue 4 months ago • 3 comments

Proposed changes

Fixes partly DEV-679 Depends on: #4528

  • Refactor buildkit solve build: In order to be able to retry a certain amount of types we needed to refactor it.
  • Move buildkit errors to its own package.
  • Check buildkit grpc erros. All transient errors from grpc that must be retried were code 13 and 14 (GRPC Errors)
  • Add unit tests checking all the possible scenarios. See pkg/build/buildkit/runner_test.go for the scenarios
  • Added the environment variable OKTETO_MAX_RETRIES_FOR_BUILDKIT_TRANSIENT_ERRORS in order to configure how many times a build will be attempted. (Needs documentation when we decide the naming convention)

Scenarios tested

Tested all the scenarios using the following Dockerfile

FROM alpine

RUN sleep 10000000000000
  • [ ] Scale down buildkit statefulset while building (Code 13)
  • [ ] kubectl delete pod (Code 13)
  • [ ] kubectl exec removing buildkit (Code 13) pod restarts
  • [ ] kubectl exec removing build session (error non transient)
  • [ ] kubectl exec removing command inside session (error non transient)
  • [ ] kubectl drain node (Code 13)
  • [ ] kubectl delete node (Code 14)

CLI Quality Reminders 🔧

For both authors and reviewers:

  • Scrutinize for potential regressions
  • Ensure key automated tests are in place
  • Build the CLI and test using the validation steps
  • Assess Developer Experience impact (log messages, performances, etc)
  • If too broad, consider breaking into smaller PRs
  • Adhere to our code style and code review guidelines

jLopezbarb avatar Oct 10 '24 14:10 jLopezbarb