trio Skip schedule points if we've done one recently enough

This is a performance improvement that we keep talking about, but I couldn't find an issue for it, except #32 which is broader-reaching.

The idea: checkpoint_if_cancelled() (which is also used by checkpoint() in the not-cancelled case) should not actually yield to the scheduler if it has yielded within the past... some amount of time that is probably in the 0.5ms to 5ms range.

Exception: we should always yield on the first checkpoint after an assignment to Task.coro or Task.context, because there is code in the wild that relies on this as a way to pick up the effects of those assignments. This can be done by making these members into properties that set some flag/etc.

We should measure whether it works better to do this in unrolled_run() (so checkpoint_if_cancelled() remains unchanged, but the scheduler immediately resumes the same task if it hasn't been long enough) or in checkpoint_if_cancelled() (the "when last yielded" would be a member of Task in that case). It depends how the overhead of yielding (which will be worse for deeper callstacks) compares to the overhead of looking up the current task from thread-locals.

It should probably be possible (for the benefit of tests & other code that relies on the reschedule-every-tick assumption) to disable checkpoint skipping in a particular region, e.g. with a context manager that sets some flag on the current task.

Oct 12 '20 17:10 oremanj

Since this is a pure optimization, we can also consider approximations. E.g., keep a single global (or thread local) record of the last time any cancel_shielded_checkpoint was executed.

We'll also want to think how this affects test determinism and any future pluggable scheduler support. Though those are low-level enough, and cancel_shielded_checkpoint is performance-critical enough, that it might make sense to monkeypatch in a different implementation in those cases.

Oct 12 '20 19:10 njsmith

Selecting some hard-coded threshold sounds... not suitable for the variety of computers x environments in the present and future?

I'd like to be able to run some blessed calibration code on a particular platform that estimates the overhead of a checkpoint (the yield + scheduler's machinery), and suggests a reasonable threshold range. The application needs to decide where it wants to be in that range, trading off wasted CPU cycles vs. scheduling responsiveness.

Oct 14 '20 11:10 belm0

by the way, we track scheduling responsiveness like this:

# Latency of sleep(0) is used as a proxy for trio scheduler health.
# A large latency implies that there are tasks performing too much work
# between checkpoints.
async for _ in periodic(1 / 10):
    start_time = trio.current_time()
    await trio.sleep(0)
    trio_scheduling_latency_metric.observe(trio.current_time() - start_time)

and typical output is: Screen Shot 2020-10-14 at 8 40 48 PM

i.e. typical median pass of the scheduler is 500 usec (on a measly low-powered i5, and we have a fair number of active tasks at any moment), and as a soft real-time app that's kind of important

Oct 14 '20 11:10 belm0