pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

FR: Task-level (and maybe Pipeline-level) resource requests and limits

Open lbernick opened this issue 3 years ago • 15 comments

Feature request

Currently, users can set resource requests and limits at the Step level, which are summed to determine the resource requests of the pod that runs a TaskRun. However, some users want to be able to directly specify the resource requests for a Task, rather than per Step.

Related issues

  • #2986: a FR for Task-level resource requests (however, the Step resource request behavior described is no longer true)
  • #4271: FR for Pipeline-level resource requests
  • #4347: Confusion that step-level resource requests are summed

lbernick avatar Jan 12 '22 16:01 lbernick

/kind design /assign

lbernick avatar Mar 09 '22 21:03 lbernick

a bit of a wrinkle here. Let's say we want to allow people to specify the total resource requests and limits that should be used by all steps in their task:

  • k8s states that containers without resource limits are considered to have higher limits than those with limits configured (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources)
  • If the task resource limit is applied to only one container, the pod will therefore not have an effective limit.
  • If the task limit is applied to each container, the pod has a much higher limit than desired. This is especially problematic if requests are not set, because the request will then automatically be set to the same value as the limit, and the pod may have difficulty being scheduled.
  • If the task limit is spread out among containers, a task where one step is more resource intensive than all the others could get oomkilled or throttled.
  • resource requirements can't be updated, so we can't dynamically adjust them as steps run.

One way around this is to support only Task-level resource requests, not limits.

Another option is to run Steps as init containers rather than containers (instead of supporting this feature). As a result, k8s will correctly determine the effective pod resource requirements. We would have to rework our entrypoint but I imagine it would be doable. However, we would have no way to support Sidecars.

lbernick avatar Mar 10 '22 20:03 lbernick

Another option is to run Steps as init containers rather than containers (instead of supporting this feature). As a result, k8s will correctly determine the effective pod resource requirements. We would have to rework our entrypoint but I imagine it would be doable. However, we would have no way to support Sidecars.

We used to run Tekton in init containers before, init containers have some nice characteristics but also come with a bunch of drawbacks: see https://github.com/tektoncd/pipeline/issues/224

dibyom avatar Mar 14 '22 16:03 dibyom

imo, https://github.com/tektoncd/pipeline/pull/4176 made defining cpu/mem requests much less intuitive. assuming steps in a task will never run in parallel, task-level resource definition would definitely alleviate some of that pain.

joshuasimon-taulia avatar Mar 16 '22 22:03 joshuasimon-taulia

design doc

Must join tekton-dev or tekton-users to view/comment

lbernick avatar Mar 21 '22 17:03 lbernick

/kind tep

lbernick avatar Mar 28 '22 16:03 lbernick

Opened TEP-0104 to propose this feature.

lbernick avatar Apr 08 '22 14:04 lbernick

Hi @lbernick, perhaps I could pick this issue up?

digesting the required logic from TEP and target to raise a draft PR in this week.

austinzhao-go avatar May 09 '22 19:05 austinzhao-go

/assign @austinzhao-go

Thanks Austin!

lbernick avatar May 09 '22 19:05 lbernick

I want to amend what I said in https://github.com/tektoncd/pipeline/issues/4470#issuecomment-1064464624; since the container limits are enforced on individual containers by the container runtime, and pod effective limits seem less relevant than pod effective requests, I've updated the design in https://github.com/tektoncd/community/pull/703 to propose support for Task-level limits as well.

lbernick avatar May 11 '22 13:05 lbernick

thanks this update @lbernick

just confirm my understanding about the update:

  • task-level limit field will be added and written into step-level (as will pass Pod scheduler check with a higher total amount, but still get enforced by contain-level in runtime)
  • resources field (requests + limits) will be added under 2 positions
    • Task.sepc.resources
    • PipelineRun.TaskRunSpecs.resources (think will overwrite Task.* ones if both specified as for a runtime precedence)
  • sidecar container will be specified separately (from task-level) for resource requirements and by sidecar-wise (so keep as now, and NOT have a resource field under sidecars -- which will mean for all sidecars)

not change:

  • final requirements (after applying task-level resources requirement) will be written into the last-step to take effect.

austinzhao-go avatar May 11 '22 16:05 austinzhao-go

Yup that's correct @austinzhao-go !

lbernick avatar May 11 '22 17:05 lbernick

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Aug 09 '22 17:08 tekton-robot

/remove-lifecycle stale we really need an intuitive way to set requests/limits for the entire pod/task

joshuasimon-taulia avatar Aug 09 '22 18:08 joshuasimon-taulia

/lifecycle frozen

This should be ready soon :)

lbernick avatar Aug 09 '22 19:08 lbernick

This feature is available on main now and will be out in the next release. We won't be implementing Pipeline-level resource requirements for the foreseeable future (it's hard to have a non confusing API for this when some components are running in parallel and some sequentially), although if there's a strong use case for it we can reconsider.

lbernick avatar Aug 16 '22 14:08 lbernick