glommio icon indicating copy to clipboard operation
glommio copied to clipboard

fix preemption timer when using `rush_dispatch()`

Open HippoBaro opened this issue 3 years ago • 0 comments

The idea behind rush_dispatch is great. Sometimes, it is advantageous to rush to the rings and performs some io_uring transactions immediately.

Our implementation suffers from one deadly bug, however. What if, while consuming the CQEs, we get the preemption timer back? Uh oh! If that happens, we silently throw it away. And because the pointers to the ring have now changed (and are equal), need_preempt returns false. Uh oh!! Even worse, we now don't have a preemption timer, so we will run every task in the system to completion (or until Poll::Pending) before going to the reactor once again.

The fix is to check whether the timer has returned after rushing to the rings. We also add a new variant of need_preempt(), need_preempt_check_timer() that explicitly checks the timer too. This more expensive check is only used in-between scheduling two queues to avoid undue overhead.

Therefore, the control flow goes like this:

  • rush_dispatch is called, and the timer is collected during the flush;
  • rush_dispatch detects that the timer has returned and marks the task queue for yield;
  • the scheduler calls need_preempt_check_timer() before scheduling the next queue and short-circuits back to the reactor;
  • a new preemption timer is armed, and we go on our merry way.

HippoBaro avatar Dec 29 '21 04:12 HippoBaro