distributed icon indicating copy to clipboard operation
distributed copied to clipboard

Overhaul transitions for the resumed state

Open crusaderky opened this issue 3 years ago • 2 comments

  • Mutually exclusive with #6716
  • Mutually exclusive with #6844
  • Closes #6682
  • Closes #6689
  • Closes #6693

crusaderky avatar Jul 09 '22 15:07 crusaderky

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   6h 30m 37s :stopwatch: - 35m 18s   3 005 tests +5    2 909 :heavy_check_mark: +1    82 :zzz:  -   7    4 :x: +  1  10 :fire: +10  22 239 runs   - 7  21 186 :heavy_check_mark:  - 6  967 :zzz:  - 84  16 :x: +13  70 :fire: +70 

For more details on these failures and errors, see this check.

Results for commit f190b4b8. ± Comparison against base commit 1d0701b5.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Jul 09 '22 16:07 github-actions[bot]

There are two failing tests. The amount of brain cells that died trying to understand them exhausted me, so I stopped work on this and I restarted from scratch on a more radical approach: #6716

crusaderky avatar Jul 12 '22 13:07 crusaderky

@fjetter I realised that _transition_from_executing is conceptually flawed. It assumes that we're getting there from one of the exit events of execute(), or in other words assert ts.done. Except that the assertion will fail in case of scatter.

e.g. x = c.submit(sleep, 10) (execute jupyter cell) (modify and re-execute jupyter cell within 10 seconds:) x = c.scatter(1)

In this case, there's going to be a transition executing->memory while ts.done is False, followed by a transition memory->memory when execute() completes. You'll end up with the task permanently in WorkerState.executing, permanently 1 less thread to run other tasks on, and permanently reduced resources.

I'm deleting _transition_from_executing and moving all of its logic to _execute_done_common, like in #6844. It's where it should be: a task is removed from the executing set and releases resources iff execute() has just terminated.

crusaderky avatar Aug 15 '22 15:08 crusaderky

I realised that _transition_from_executing is conceptually flawed.

possible

Except that the assertion will fail in case of scatter.

Why is scattering even transitioning via executing? That sounds wrong. I remember concretely having a released->memory transition for this at some point in time

fjetter avatar Aug 16 '22 12:08 fjetter

Why is scattering even transitioning via executing? That sounds wrong. I remember concretely having a released->memory transition for this at some point in time

There's nothing that stops a user from scattering an object while a task with the same key is already running. This is most likely to happen while prototyping on a jupyter notebook.

crusaderky avatar Aug 16 '22 14:08 crusaderky

@fjetter ready for final review and merge

crusaderky avatar Aug 16 '22 23:08 crusaderky