sledge-serverless-framework icon indicating copy to clipboard operation
sledge-serverless-framework copied to clipboard

RUNTIME_SIGALRM_HANDLER_TRIAGED causes missed epoll updates

Open bushidocodes opened this issue 3 years ago • 1 comments

RUNTIME_SIGALRM_HANDLER_TRIAGED is a variant of the EDF scheduler that attempts to triage which workers a might actually preempt based on deadline in order to reduce spurious SIGARLMs. However, now that the SIGALRM handler is also responsible for checking the thread local epoll for data, this optimization might result in sandboxes blocking unexpectedly long.

A possible solution is to add a runtime array of the number of sandboxes in the blocked state per worker. If a worker has a non-zero value, it should always be forwarded the SIGALRM such that it checks its epoll handler each quantum.

bushidocodes avatar May 19 '21 01:05 bushidocodes

The proposed solution sounds promising, but what if a worker has a NZ value (hence there's a blocked sandbox at that worker) AND meanwhile the worker picked another sandbox from its local queue (say the global head was further in time) and began executing that sandbox, then should the worker really be forwarded the SIGALRM? Just thinking out loud...

emil916 avatar May 25 '21 12:05 emil916