worker
worker copied to clipboard
Inform hub of un-requeueable job
A GCE job could not get started, and the requeue ended up in error.
Oct 23 20:31:57 production-1-worker-com-c-5-gce level=error msg="couldn't start instance" err="context deadline exceeded"
Oct 23 20:31:57 production-1-worker-com-c-5-gce level=info msg="requeueing job"
Oct 23 20:31:57 production-1-worker-com-c-5-gce level=error msg="couldn't requeue job" err="context deadline exceeded"
Now, hub could not be informed of the failure of the job, and only after quite some time, hub did a cleanup:
travis-com-hub-production
Erroring stale job: id=123 state=received updated_at=2017-10-23 18:26:41 UTC.
Amount of occurrences of couldn't requeue job
in the last 7 hours, grouped by hour (cest):
GCE .org:
06 321
07 58
08 1
09 3
10 2
11 2
13 2
GCE .com:
06 232
07 47
09 9
10 1
11 59
12 5
13 3
This means that the concurrency has been consumed by this stale job fore quite a while. It would be good if more attempts at informing hub in this error scenario would be taken.
Some extra details in this support ticket.