exo icon indicating copy to clipboard operation
exo copied to clipboard

Task Deduplication

Open Evanev7 opened this issue 2 months ago • 1 comments

Motivation

Sometimes a worker would send a runner a completed task, due to a race between the task status being updated to completed (which happens first) and the runner being set to ready (which is SEEN first, because it doesn't need the round trip through the master to be taken into account)

Changes

The runners supervisor now tracks completed tasks and ignores them, and the planner now also skips completed tasks in the runner.

Why It Works

We essentially skip the network -> apply to state round trip by having a local check in the runners local state.

Test Plan

Manual Testing

None, race conditions are hard to test locally.

Automated Testing

Should be covered by test_event_ordering, which asserts the status updates are emitted after the task completion events, and an assertion that we only complete a task while in a "task running" state - Connecting, Loading, etc.

Future work

The runners tasks generally fit a "In Progress" and "Ready" pair. While this is logical, it is a bit verbose.

Evanev7 avatar Dec 31 '25 01:12 Evanev7

Looks good. Did some sanity checks launching instances, chatting and deleting instances, on a 2 node cluster and all was good.

AlexCheema avatar Dec 31 '25 17:12 AlexCheema