okteto
okteto copied to clipboard
feat: retry if buildkit not available
Proposed changes
Fixes partly DEV-679 Depends on: #4528
- Refactor buildkit solve build: In order to be able to retry a certain amount of types we needed to refactor it.
- Move buildkit errors to its own package.
- Check buildkit grpc erros. All transient errors from grpc that must be retried were code 13 and 14 (GRPC Errors)
- Add unit tests checking all the possible scenarios. See pkg/build/buildkit/runner_test.go for the scenarios
- Added the environment variable
OKTETO_MAX_RETRIES_FOR_BUILDKIT_TRANSIENT_ERRORS
in order to configure how many times a build will be attempted. (Needs documentation when we decide the naming convention)
Scenarios tested
Tested all the scenarios using the following Dockerfile
FROM alpine
RUN sleep 10000000000000
- [ ] Scale down buildkit statefulset while building (Code 13)
- [ ] kubectl delete pod (Code 13)
- [ ] kubectl exec removing buildkit (Code 13) pod restarts
- [ ] kubectl exec removing build session (error non transient)
- [ ] kubectl exec removing command inside session (error non transient)
- [ ] kubectl drain node (Code 13)
- [ ] kubectl delete node (Code 14)
CLI Quality Reminders 🔧
For both authors and reviewers:
- Scrutinize for potential regressions
- Ensure key automated tests are in place
- Build the CLI and test using the validation steps
- Assess Developer Experience impact (log messages, performances, etc)
- If too broad, consider breaking into smaller PRs
- Adhere to our code style and code review guidelines