feedback icon indicating copy to clipboard operation
feedback copied to clipboard

Global setting for per step timeout

Open evilmarty opened this issue 8 years ago • 12 comments

Per-step timeout is supported in build pipelines via timeout_in_seconds and via the interface but it would be great to set a default timeout_in_minutes either as an agent option or build setting. By default the value could be zero indicate an indefinite timeout.

The reason for this is to avoid agents being stuck on jobs that are either exceptionally too long or stuck because of bugs. Making sure every step is configured with an automatic timeout is difficult to manage, especially with numerous projects that include pipeline definitions in source control.

evilmarty avatar Jul 22 '16 04:07 evilmarty

@evilmarty

The link you provided about timeout_in_seconds has no option about timeout now.

I currently can define timeout via https://github.com/buildkite/feedback/issues/36 through web interface.

But how to put this timeout option in pipeline.yml?

Is it the same question raised here?

Updates

Thanks, @evilmarty

I search again and found the document: https://buildkite.com/docs/pipelines/command-step

I can add it in pipeline.yml now.

timeout_in_minutes: 60

ozbillwang avatar May 03 '17 04:05 ozbillwang

The docs have been updated and have removed the step declarations examples. It is in the master branch of your docs so maybe a regression?

My question is how can I set a global timeout in the absence of one being set in the UI or in a YAML file?

evilmarty avatar May 03 '17 04:05 evilmarty

I'm curious about this as well. Is there a way to have a global timeout that doesn't involve the web interface?

avtar avatar Feb 12 '18 21:02 avtar

Anyone? Bueller?

avtar avatar Mar 07 '18 19:03 avtar

I'd very much like to see an agent-level default job timeout so that frozen jobs don't run forever.

This is especially important because the scaling policy for https://github.com/buildkite/elastic-ci-stack-for-aws currently requires zero running jobs before scaling in. So a single frozen job can prevent scale-in and cost lots of money on a large stack.

A configuration option on https://github.com/buildkite/agent would be great — however I did a bit of exploration in the hopes of opening a PR but it looks like the timeout is driven server-side so there's no good way to add the option on the agent without some backend changes.

pda avatar Jun 27 '18 03:06 pda

Trying to think how this could work as agent configuration when timeouts are backend-driven.

It would be possible to implement an agent-side timeout. However I don't think there's an existing way for the agent to communicate that it was a timeout; it would look like a general command failure. And the agent timeout could race the server-side step timeout if they're similar. The agent API could be extended to allow agent-driven timeouts, but it would still be racy and inconsistent with per-step timeouts. I don't think this is a good idea.

Instead, when an agent connects to the backend it could advertise the default timeout. Then it can be visible on the agent listing etc. When a job is allocated to an agent, it would use the per-step timeout if present, otherwise the agent default timeout. Enforcing the timeout (per-step or per-agent) remains backend driven. That doesn't seem like such a bad option.

pda avatar Jun 27 '18 04:06 pda

I think this is an important thing to fix! Will move discussion over to the PR.

keithpitt avatar Jul 03 '18 02:07 keithpitt

Just want to chime in to say this would be really useful - our elastic stack bill went through the roof because we didn't notice a few stuck jobs that prevented our stack from scaling down for ~3w. 😱

BRMatt avatar Jun 12 '19 11:06 BRMatt

Sorry for yet another +1 comment, but this would be really useful.

goodspark avatar Oct 07 '21 00:10 goodspark

+1, would be very useful

heidimhurst avatar May 10 '22 09:05 heidimhurst

+1 would be very useful

samsarkleio avatar Jun 07 '22 16:06 samsarkleio

fwiw this appears to now be available in the UI pipeline settings > builds; see Changelog notes

image

Suggest closing this issue @evilmarty

heidimhurst avatar Jul 26 '22 11:07 heidimhurst