prefect icon indicating copy to clipboard operation
prefect copied to clipboard

Multi-select flow runs and set state in bulk (to clear late runs, delete scheduled runs in bulk, and more)

Open anna-geller opened this issue 2 years ago • 16 comments

First check

  • [X] I added a descriptive title to this issue.
  • [X] I used the GitHub search to find a similar request and didn't find it.
  • [X] I searched the Prefect documentation for this feature.

Prefect Version

2.x

Describe the current behavior

Currently, there is no easy way to:

  1. clear late runs
  2. cancel multiple scheduled runs
  3. delete many runs with arbitrary state or tag when needed

Describe the proposed behavior

Create a feature allowing to:

  1. Filter for runs based on state e.g. Late/Scheduled/Failed or tag or work queue name
  2. Select all (e.g., from the UI, CLI, or Python client)
  3. Change the state for all those runs in bulk or delete those runs

Example Use

  • Feature parity with 1.0 having a button "Clear late runs"
  • Ability to clear late runs scheduled for the past so that they are not picked up when the agent gets restarted -- especially useful when an agent goes down, and the user doesn't want that those late runs to get executed (e.g. you may prefer to delete those late runs and start scheduling only future runs from the point when an agent got restarted)

Additional context

Imagine the scenario: you have an hourly scheduled deployment. Your agent went down yesterday at 9 AM. You realized that agent issue today and restarted the agent process at 9 AM the following day.

Current behavior: there are 24 late runs that get immediately picked up once the agent process gets restarted. Some users, justifiably, don't like it, as they would prefer to just start scheduling future runs and clear those 24 late runs scheduled for the past.

anna-geller avatar Sep 28 '22 17:09 anna-geller

Would LOVE to see this implemented. I have spent MANY hours over the last couple of months clearing "stuck" flow runs when my k8s agent has silently crashed. I must clear the pending jobs before deleting the agent pod otherwise the new agent will try to pickup the, at times, hundreds of pending flows which will then cause the agent to get into a continuous crash cycle. Having to select them 1 by 1 from the UI, when there are 500+ flow runs, is incredibly time consuming and tedious.

efranksrecroom avatar Jan 20 '23 20:01 efranksrecroom

cc @zhen0 / @billpalombi this looks like it was triaged but I want to be sure it's actually on a roadmap.

zanieb avatar Jan 20 '23 20:01 zanieb

Thanks @madkinsz! I'm adding to the UI backlog and we can set a priority and see who can pick it up at our team huddle on Monday.

zhen0 avatar Jan 20 '23 21:01 zhen0

@zhen0 Just wanted to add that pages for flows, deployments, and work-queues already support the desired behavior (i.e. anywhere we use lists instead of cards). Having a check box next to the "{n} Flow runs" text above the cards that acted as "select all" would probably work well.

EmilRex avatar Jan 20 '23 21:01 EmilRex

Small update here that we want to think through how we clear these runs - we can easily have thousands (or more) of flow runs in the ui at one time so it's not as simple as the run multi-select or flow/deployment table view objects. We're actively looking into it.

zhen0 avatar Jan 24 '23 19:01 zhen0

@zhen0 definitely! In case it makes sense, I'll just add that even being able to select one page-worth at a time would be helpful.

EmilRex avatar Jan 24 '23 22:01 EmilRex

@billpalombi - shall we add this to product orchestration's backlog?

zhen0 avatar Jan 25 '23 03:01 zhen0

Can we have a flow deployment flag that just says "don't catch up on missed runs?" I have jobs that really cannot have two instances running at the same time (trying to insert the same data set into the same table.)

cgoodric avatar Feb 01 '23 18:02 cgoodric

Is there is any API example related to Prefect 2.0 which can workaround this problem right now? Some ideas were provided in the original question: https://github.com/PrefectHQ/prefect/discussions/5005 but mostly don't work for prefect 2.0 API. Also it would be great as suggested @cgoodric to have ability handle this on API level automatically. Every time doing clean-up via UI is very annoying.

krasoffski avatar Feb 08 '23 14:02 krasoffski

Hello again,

@anna-geller, can I use this approach for skipping unnecessary flows (within flow function)?

flow_run = context.get_run_context().flow_run
max_overdue = 300

if (
    flow_run.auto_scheduled  # allowing manual runs
    and flow_run.estimated_start_time_delta > datetime.timedelta(seconds=max_overdue)
):
    run_logger.warning("Cancelling task as outdated")
    return Cancelled()

krasoffski avatar Feb 09 '23 13:02 krasoffski

Just experienced this issues today and found out it's not possible to delete late runs. In our case the vast majority of times I just want to delete late runs and get fresh data on the next scheduled run. Seems the agent lost connection and being Easter I didn't realise so today I spent 20 minutes clicking 486 checkboxes to delete the late runs 🥲

bobpeers avatar Apr 08 '23 15:04 bobpeers

@bobpeers sounds like this could be resolved by #9054

zanieb avatar Apr 09 '23 17:04 zanieb

@madkinsz Yes that would solve it for me 🫡

bobpeers avatar Apr 09 '23 17:04 bobpeers

clicking checkboxes isn't fun. we would love to see this impl.

klayhb avatar Aug 29 '23 12:08 klayhb

Slow cleanup workaround (1-5s per deletion), but it does the job.

for _ in {1..10}; do time prefect flow-run ls --limit 100 --state-type=SCHEDULED --state=Late --flow-name=FLOW_NAME | tail -n +5 | head -n -1 | awk '{print $2}' | while read guid; do prefect flow-run delete $guid; done; done

th0ger avatar Nov 14 '23 09:11 th0ger

let buttons = Array.from(document.querySelectorAll('input[class=p-checkbox__input]'));

(async () => {
  for (let b of buttons) {
    console.log(b)
    b.click();
    await new Promise(r => setTimeout(r, 100));
  }
})();

you can also put this into the browser console to script clicking every button on the page (note: this will select 50 since the list is virtualized, so you must do this each time per 50 items.

jackharrhy avatar Jan 23 '24 18:01 jackharrhy

Can we have a flow deployment flag that just says "don't catch up on missed runs?" I have jobs that really cannot have two instances running at the same time (trying to insert the same data set into the same table.)

We need this

khgouldy avatar Feb 25 '24 22:02 khgouldy

As of v2.16.5, the UI now has a shortcut for selecting/deselecting multiple flow runs at once. The selection takes current in-use filters into account to ultimately provide users with the ability to:

  1. Navigate to a page within the UI
  2. Filter by criteria like state
  3. Select all
  4. Delete

Should be available on all pages where flow runs are currently listed in filterable and selectable views - e.g. the flow runs, flow, and deployment pages.

Please note that these changes only affected the UI.

collincchoy avatar Mar 25 '24 19:03 collincchoy