pipeline
pipeline copied to clipboard
FR: Task-level (and maybe Pipeline-level) resource requests and limits
Feature request
Currently, users can set resource requests and limits at the Step level, which are summed to determine the resource requests of the pod that runs a TaskRun. However, some users want to be able to directly specify the resource requests for a Task, rather than per Step.
Related issues
- #2986: a FR for Task-level resource requests (however, the Step resource request behavior described is no longer true)
- #4271: FR for Pipeline-level resource requests
- #4347: Confusion that step-level resource requests are summed
/kind design /assign
a bit of a wrinkle here. Let's say we want to allow people to specify the total resource requests and limits that should be used by all steps in their task:
- k8s states that containers without resource limits are considered to have higher limits than those with limits configured (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources)
- If the task resource limit is applied to only one container, the pod will therefore not have an effective limit.
- If the task limit is applied to each container, the pod has a much higher limit than desired. This is especially problematic if requests are not set, because the request will then automatically be set to the same value as the limit, and the pod may have difficulty being scheduled.
- If the task limit is spread out among containers, a task where one step is more resource intensive than all the others could get oomkilled or throttled.
- resource requirements can't be updated, so we can't dynamically adjust them as steps run.
One way around this is to support only Task-level resource requests, not limits.
Another option is to run Steps as init containers rather than containers (instead of supporting this feature). As a result, k8s will correctly determine the effective pod resource requirements. We would have to rework our entrypoint but I imagine it would be doable. However, we would have no way to support Sidecars.
Another option is to run Steps as init containers rather than containers (instead of supporting this feature). As a result, k8s will correctly determine the effective pod resource requirements. We would have to rework our entrypoint but I imagine it would be doable. However, we would have no way to support Sidecars.
We used to run Tekton in init containers before, init containers have some nice characteristics but also come with a bunch of drawbacks: see https://github.com/tektoncd/pipeline/issues/224
imo, https://github.com/tektoncd/pipeline/pull/4176 made defining cpu/mem requests much less intuitive. assuming steps in a task will never run in parallel, task-level resource definition would definitely alleviate some of that pain.
/kind tep
Opened TEP-0104 to propose this feature.
Hi @lbernick, perhaps I could pick this issue up?
digesting the required logic from TEP and target to raise a draft PR in this week.
/assign @austinzhao-go
Thanks Austin!
I want to amend what I said in https://github.com/tektoncd/pipeline/issues/4470#issuecomment-1064464624; since the container limits are enforced on individual containers by the container runtime, and pod effective limits seem less relevant than pod effective requests, I've updated the design in https://github.com/tektoncd/community/pull/703 to propose support for Task-level limits as well.
thanks this update @lbernick
just confirm my understanding about the update:
- task-level limit field will be added and written into step-level (as will pass
Pod
scheduler check with a higher total amount, but still get enforced by contain-level in runtime) - resources field (requests + limits) will be added under 2 positions
-
Task.sepc.resources
-
PipelineRun.TaskRunSpecs.resources
(think will overwriteTask.*
ones if both specified as for a runtime precedence)
-
- sidecar container will be specified separately (from task-level) for resource requirements and by sidecar-wise (so keep as now, and NOT have a resource field under sidecars -- which will mean for all sidecars)
not change:
- final requirements (after applying task-level resources requirement) will be written into the last-step to take effect.
Yup that's correct @austinzhao-go !
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
/remove-lifecycle stale we really need an intuitive way to set requests/limits for the entire pod/task
/lifecycle frozen
This should be ready soon :)
This feature is available on main now and will be out in the next release. We won't be implementing Pipeline-level resource requirements for the foreseeable future (it's hard to have a non confusing API for this when some components are running in parallel and some sequentially), although if there's a strong use case for it we can reconsider.