tink icon indicating copy to clipboard operation
tink copied to clipboard

GlobalTimeout appears unused

Open grahamc opened this issue 5 years ago • 14 comments

Right now, the global timeout is not actually used except at display-time in the portal project:

[grahamc@Petunia:~/projects/github.com/tinkerbell]$ rg GlobalTimeout
tink/executor/types.go
7:	GlobalTimeout int    `yaml:"global_timeout"`

tink/db/workflow.go
29:		GlobalTimeout int    `yaml:"global_timeout"`

portal/src/client/workflow.go
123:	wf.Timeout = strconv.Itoa(details.GlobalTimeout)

portal/src/pkg/types/workflow.go
26:	GlobalTimeout int    `yaml:"global_timeout"`

I propose the following semantics. Given a workflow:

global_timeout: 30
tasks:
- name: "example"
  worker: "{{.device_1}}"
  actions:
  - name: "phase-1"
    image: hello-world
    timeout: 20
    command: sleep 20
  - name: "phase-2"
    image: hello-world
    timeout: 20
    command: sleep 20
  1. phase-1 would execute to completion, take 20 seconds, and the entire workflow would have 10 seconds to complete the rest of the phases.
  2. phase-2 would begin, run for 10 seconds, be killed, and the workflow is canceled.

grahamc avatar Jun 30 '20 13:06 grahamc

@grahamc I have a suggestion. Since we are parsing the yaml before creating a template. We can add the timeout of all the actions and then we can comapre it with global timeout. If it exceeds we can simply say that this template is invalid.

parauliya avatar Jul 13 '20 08:07 parauliya

I like the idea to validate it at the API layer as @parauliya suggested I presume

gianarb avatar Aug 05 '20 07:08 gianarb

As discussed, There is no use of global_timeout as we have action level timeout so this needs to be removed from the template and code itself.

parauliya avatar Aug 27 '20 06:08 parauliya

As discussed, There is no use of global_timeout as we have action level timeout so this needs to be removed from the template and code itself.

There was a misunderstanding with the use of "global_timeout" but finally it was decided to have it in the template and add the support in the code.

parauliya avatar Aug 27 '20 14:08 parauliya

I had a chat with a bunch of people and we agreed on how this feature will be implemented.

The GlobalTimeout is a workflow attribute that acts as a safeguard to guarantee that workflow will end. Tinkerbell can't get to a point where workflow runs forever. The global timeout needs to have a default value in tink-server (15 minutes), the value has to be configurable with a flag or env var when the tink-server starts.

A template can have a global_timeout field that overrides the default value set in the tink-server, 0 is not a supported value and the template creation should return a validation error.

gianarb avatar Sep 09 '20 11:09 gianarb

The global timeout needs to have a default value in tink-server (15 minutes), the value has to be configurable with a flag or env var when the tink-server starts. Workflow supports a global_timeout, 0 is not a supported value.

@gianarb Does this mean global_timeout will no longer be part of the workflow template?

gauravgahlot avatar Sep 09 '20 11:09 gauravgahlot

@gauravgahlot I modified the comment to make it even more explicit, let me know if it helps

gianarb avatar Sep 09 '20 12:09 gianarb

@gianarb Sure it does. Thank you! 👍🏼

gauravgahlot avatar Sep 09 '20 12:09 gauravgahlot

I had a chat with a bunch of people and we agreed on how this feature will be implemented.

The GlobalTimeout is a workflow attribute that acts as a safeguard to guarantee that workflow will end. Tinkerbell can't get to a point where workflow runs forever. The global timeout needs to have a default value in tink-server (15 minutes), the value has to be configurable with a flag or env var when the tink-server starts.

A template can have a global_timeout field that override the default value set in the tink-server, 0 is not a supported value.

@gianarb If a template has global_timeout set to 0, should we ignore and use the default one(15 min) or we should give error while creating such template?

parauliya avatar Sep 09 '20 12:09 parauliya

should we ignore and use the default one(15 min) or we should give error while creating such template?

we should return an error because the template is not valid

gianarb avatar Sep 09 '20 13:09 gianarb

Can this issue be closed now?

tstromberg avatar Jul 27 '21 16:07 tstromberg

@tstromberg I don't think if this was addressed and I believe it should be. So, we should not close this issue.

gauravgahlot avatar Jul 27 '21 17:07 gauravgahlot

Looks like this could use a PR. Also, maybe we can start by adding some code references to where is could be started or has been started.

jacobweinstock avatar Sep 21 '21 15:09 jacobweinstock

Relabeling to important-longterm because this is still important, but is probably not going to be worked on in the short term.

nshalman avatar Nov 02 '21 15:11 nshalman

We intend on refactoring the new Template CRD that was based on the original template JSON. We'll take this into consideration as part of that refactor (https://github.com/tinkerbell/tink/issues/661).

chrisdoherty4 avatar Dec 24 '22 19:12 chrisdoherty4