prefect icon indicating copy to clipboard operation
prefect copied to clipboard

Agent: Add limit to control number of concurrent flow runs

Open eudyptula opened this issue 2 years ago • 1 comments

Added a --limit option on 'prefect agent start' to limit the number of flow runs an agent will start simultaneously.

Example

prefect agent start -l 4 prefect agent start --limit 4

Checklist

  • [ ] This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • [ ] This pull request includes tests or only affects documentation.
  • [ ] This pull request includes a label categorizing the change e.g. fix, feature, enhancement

eudyptula avatar Oct 28 '22 10:10 eudyptula

Deploy Preview for prefect-orion ready!

Built without sensitive environment variables

Name Link
Latest commit f008732185e7cd2731a3eba7e905dd68e87b0fcb
Latest deploy log https://app.netlify.com/sites/prefect-orion/deploys/638631142e89bc0008e57f45
Deploy Preview https://deploy-preview-7361--prefect-orion.netlify.app/api-ref/prefect/agent
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

netlify[bot] avatar Oct 28 '22 10:10 netlify[bot]

Hi! Any news on the state of this PR? Having a limit on the agent level is a really nice feature that my projects could benefit as well

andreas-ntonas avatar Nov 09 '22 15:11 andreas-ntonas

@eudyptula thanks for the pull request! This looks reasonable to me. We may want to clarify that this is a limit on running flows as we could also introduce a separate limit on concurrent submissions. We'll need tests before this can be accepted.

zanieb avatar Nov 09 '22 17:11 zanieb

@madkinsz The agent should run no more than N flow runs concurrently with this new parameter. It will not take more flow runs from the work queue, than it can handle. Meaning that the remaining (scheduled) flow runs will stay in the queue for any other agent to pick up. At least if I made the update as I intended :D seems to work for us though.

Hope it answers your question?

Came out of this question on slack, btw.: https://prefect-community.slack.com/archives/CL09KU1K7/p1666864461860479

eudyptula avatar Nov 11 '22 06:11 eudyptula

Yep it looks like that's what it does :)

We could also release capacity limit slots after task_status.started() is called which would make the limit N flow runs submitting concurrently. Submission concurrency could be useful for some users as well and we'll want to make sure it's clear that the --limit here applies to the full life cycle of the flow run infrastructure so when/if we add another type of limit the interface is still intuitive. I'm thinking ahead to a name for the submitting limit e.g. --submitting-limit and wondering if we should do something similar here e.g. --running-limit, but I think --limit is actually more accurate in this case.

zanieb avatar Nov 11 '22 07:11 zanieb

@madkinsz Not sure I completely understand what submission limit is actually limiting...

eudyptula avatar Nov 14 '22 07:11 eudyptula

@eudyptula Well the process of creating infrastructure can be rate limited by cloud providers so you may want to have that concurrency limited. Anyway, I wouldn't worry about that.

If you add tests here, we can accept this :)

zanieb avatar Nov 14 '22 16:11 zanieb

@madkinsz: I doubt it, but just realised that we may have to clarify if this update could somehow cause containers to not be shut down on crashes (in regard to my issue https://github.com/PrefectHQ/prefect/issues/7560)

eudyptula avatar Nov 17 '22 10:11 eudyptula

@madkinsz:

  • Fixed unittests so they should actually run now
  • Realised that the agent may not take the flow runs in a sensible order if there're more than one queue, so added a sort on next_scheduled_start_time
  • Wasn't a little unsure about this, but shouldn't get_and_submit_flow_runs() only return a list of the runs it actually submitted? Hope the filter I added makes sense for this. As far as I saw the return value is only used in unittests.
  • Finally, I added a quick unittest. Wasn't sure how to get the agent to actually take the flows out of the queue, so right now the submitted_flow_run_ids list just seems to continue increasing within the test (the assert account for this however).

eudyptula avatar Nov 17 '22 14:11 eudyptula

@madkinsz is "limit on concurrent submissions" in pipeline? This feature would be really useful.

MuFaheemkhan avatar Nov 29 '22 05:11 MuFaheemkhan

@MuFaheemkhan it should be trivial to add following this pattern if it works well.

zanieb avatar Nov 29 '22 15:11 zanieb

@eudyptula just fyi, I rebased onto main to resolve the conflict

Edit: ~Looks like it stole ownership of your commits... I'll try to fix that.~ The original author is restored now.

zanieb avatar Nov 29 '22 15:11 zanieb

I want to add that I have been using this PR directly for a couple of weeks and I can confirm it works the way it is supposed to! :) Also I noticed that https://github.com/PrefectHQ/prefect/pull/7362 could supplement this feature really well, like adding a tag that helps us identify the agent where the flow was executed.

andreas-ntonas avatar Nov 29 '22 16:11 andreas-ntonas

@andreas-ntonas, Happy to hear you're finding it useful :)

@madkinsz, Just glad to see it making it through :) Looking forward to ditch my fork and use the official releases again, and when the network issues are resolved, hopefully we can finally complete our migration to Prefect 2.

eudyptula avatar Dec 01 '22 06:12 eudyptula

@eudyptula while the real issue is buried deep in some http core libraries, #7593 should alleviate all of the issues for ya'll :)

zanieb avatar Dec 01 '22 06:12 zanieb

Hey @madkinsz is this released for public use? If not, what is the timeline for its release? This seems like an important feature. Thanks :)

rsampaths16 avatar Feb 03 '23 11:02 rsampaths16