argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Limit total number of pods created by the controller

Open alexec opened this issue 4 years ago • 17 comments

Summary

Kubernetes clusters often max out with a number of scheduled running pods. For us, limiting this to 5k allow us to protect ourselves.

Use Cases

Prevent broken clusters. Protect other cluster users.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

alexec avatar Nov 19 '20 18:11 alexec

@jessesuen @sarabala1979 thoughts?

alexec avatar Nov 19 '20 18:11 alexec

This would be very welcome, but definitely should be configurable. With the latest scalability improvements in v2.12 we were finally able to hit over 12K concurrent pods on our EKS cluster yesterday, and that made us very happy.

ebr avatar Nov 19 '20 19:11 ebr

Pods are created at the workflow level, not at the global controller level. option 1: Workflow can have podCreateStrategy which user configure for pod creation limit with delay. We need to consider 10s reconciliation

podCreateStrategy: 
    maxCreation: 500
    intervel : 2s

option 2: Refactor Pod creation to the common queue based worker with the rate limit.

option 3: I am not sure we can configure the API rate limit on k8s API server

sarabala1979 avatar Nov 19 '20 19:11 sarabala1979

This can be achieved using resource quotas. While resource quotas are limited, not whole cluster, I hope that works for most use cases:

https://kubernetes.io/docs/concepts/policy/resource-quotas/

I think it would be like:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: max-workflows
spec:
  hard:
    count/workflows.argoproj.io: "0"

alexec avatar May 27 '21 19:05 alexec

Hi @alexec Are you planning to implement this one? We find this feature very important. We had issues when the number of pods in the cluster reached above a certain number. Hence, a global parallelism value that dictates the total number of pods the controller can generate, will be very helpful.

amitm02 avatar May 30 '21 09:05 amitm02

There is already feature called parallelism that limits the number of workflows (not pods) reconciling you could use. You could use resource quotas to limit pods and workflows on a per namespace basis. We do not plan to implement a global limit, but you’re welcome to submit a PR.

alexec avatar May 30 '21 15:05 alexec

@alexec Hello, has this issue has been fully fixed?

adeniyistephen avatar Feb 22 '22 22:02 adeniyistephen

@adeniyistephen no. Not implemeted. Would you like to help by submitting a PR?

alexec avatar Feb 22 '22 22:02 alexec

@alexec Awesome, Yes! I would love to, I submitted it for my #7877 issue.

adeniyistephen avatar Feb 22 '22 22:02 adeniyistephen

@adeniyistephen are you working on this issue?

sarabala1979 avatar May 12 '22 16:05 sarabala1979

Not yet, I haven't started yet... I have been quite busy, I would take on it soonish, but if someone wants to jump on it. Please feel free.

adeniyistephen avatar May 12 '22 20:05 adeniyistephen

Removing assignee as there hasn't been any progress. Feel free to update here if you are still working on it.

terrytangyuan avatar Aug 16 '22 17:08 terrytangyuan

/assign @adeniyistephen

adeniyistephen avatar Sep 20 '22 11:09 adeniyistephen

@terrytangyuan I'm still interested in this issue, I have been faced with work and I haven't had time to dive into it more. Can I be reassigned to it?

adeniyistephen avatar Sep 20 '22 11:09 adeniyistephen

You can just work on it without being assigned

terrytangyuan avatar Sep 25 '22 01:09 terrytangyuan

@alexec I have made changes to limit the number of workflows with ResourceQuota and have the following observations:

  • When the hard limit is set for count/workflows.argoproj.io, it takes completed workflows also into account. So let's say the limit is set to 5k, and the count of completed workflows is equal to 5k, then we cannot create a new workflow until we delete one of the completed workflows.
  • This is different in the case of system resources like pods where the resource quota description says The total number of Pods in a non-terminal state that can exist in the namespace. A pod is in a terminal state if .status.phase in (Failed, Succeeded) is true. So here the count is actually limiting the running pods.

Shall I go ahead with the above understanding? If yes, we might have to implement an auto workflow cleanup to make sure there is a quota left for new workflows.

anilkumar-pcs avatar Oct 21 '22 05:10 anilkumar-pcs

@alexec Please take a look at the changes which addresses this issue in #9878

anilkumar-pcs avatar Oct 21 '22 16:10 anilkumar-pcs