android-components icon indicating copy to clipboard operation
android-components copied to clipboard

TC tasks should handle external service outages

Open grigoryk opened this issue 6 years ago • 3 comments

Seeing this in a build, codecov.io is having service issues. This is currently resulting in a failed task. Our tasks shouldn't fail because of something like this. We could retry once or twice when possible, and gracefully handle external failures otherwise.

BUILD SUCCESSFUL in 6m 52s
196 actionable tasks: 195 executed, 1 up-to-date
+ automation/taskcluster/action/upload_coverage_report.sh
++ python automation/taskcluster/helper/get-secret.py -s project/mobile/android-components/public-tokens -k codecov -f .cc_token
++ export CI_BUILD_URL=https://tools.taskcluster.net/tasks/fH56-5PwRbCeXb5am2PImw
++ CI_BUILD_URL=https://tools.taskcluster.net/tasks/fH56-5PwRbCeXb5am2PImw
++ bash /dev/fd/63 -t @.cc_token
+++ curl -s https://codecov.io/bash
/dev/fd/63: line 1: html: No such file or directory
/dev/fd/63: line 2: syntax error near unexpected token `<'
/dev/fd/63: line 2: `<head><title>503 Service Temporarily Unavailable</title></head>

cc @mitchhentges

┆Issue is synchronized with this Jira Task

grigoryk avatar Nov 14 '18 18:11 grigoryk

Hey Grisha!

There are several ways to deal with retries:

  • either, within the script, you retry each command that accesses external resources. Releng has solved this in Python. For instance: https://github.com/mozilla-releng/scriptworker/blob/b09074dcd114254505b8c77e767b6c850c9fce3b/scriptworker/utils.py#L229-L269. It may be doable in bash: https://stackoverflow.com/questions/7449772/how-to-retry-a-command-in-bash
  • or, in the task payload, you define a list of exit codes against which taskcluster will automatically rerun the task. For instance:
"payload" : {
  "onExitStatus": {
    "retry": [1]
  }

retries every time any command exits with error code 1. Docs: https://github.com/taskcluster/docker-worker/blob/b9bb55d99cce0c90c52f59a70527402bf864bbc9/schemas/v1/payload.json#L198-L212

The latter solution may be the hardest to maintain, but it's probably the easier to implement for bash.

JohanLorenzo avatar Nov 19 '18 11:11 JohanLorenzo

➤ bifleming commented:

kbrosnan liuche how to proceed on this ticket?

data-sync-user avatar Jun 21 '21 13:06 data-sync-user

➤ stale[bot] commented:

See: https://github.com/mozilla-mobile/fenix/issues/17373 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

data-sync-user avatar Jun 21 '21 13:06 data-sync-user

Moved to bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1795077

Change performed by the Move to Bugzilla add-on.

csadilek avatar Oct 13 '22 16:10 csadilek