sledge-serverless-framework Poor Tail Latency in Concurrency Experiment

Poor Tail Latency in Concurrency Experiment

Open bushidocodes opened this issue 4 years ago • 0 comments

The concurrency experiment executes the "empty" workload (which calls printf("S\n") and immediately exits) with high levels of concurrency ranging from 1 to 100. The pattern of how hey is invoked here exhibits extreme bursts of requests. 0 -> 1 -> 0 -> 20 -> 0 -> 40 -> 0 -> 60 -> 0 -> 80 -> 0 -> 100. The spec.json defines a single module, so all requests have the same relative deadline. As such, FIFO and EDF should behave similarly.

During a recent run of this experiment, I witnessed p100 tail latency across all scheduling policies of up to 20s. The logs and charts for this runs are as follows:

server_concurrency_gnuplots.zip

Note that based on the "offset" reported by hey, it appears that the worst tail latency occurs in batches of roughly 10ms.

Based on all this, I suspect that (assuming this is reproducible externally), there is a context/sandbox switching bug. I believe that we aren't seeing it in other experiments because this experiment triggers a significantly higher rate of sandbox switches due to:

"empty" workload executing to completion very quickly
The bursty request pattern of the driver script.

My debugging suggestion for this bug would be to validate the bug as follows:

Sanity Check hey parameters
Run on CloudLab to see if this relates to my hacky home office setup
Refactor experiment to use server-side reporting
If this is visible with server-side metrics on cloudlab, then there is likely a bug
I suspect that focusing on the state of the system during the 10ms of reported long running tasks is a good bet. The question is why these requests were unable to propagate through the system? Did a worker spin? Did the sandboxes block and get caught in epoll? Looking at the time of sandbox states for these sandboxes might help.

Possibly relevant issues:

#224
#219
#66

Jun 14 '21 15:06 bushidocodes

sledge-serverless-framework sledge-serverless-framework copied to clipboard

Poor Tail Latency in Concurrency Experiment

sledge-serverless-framework
sledge-serverless-framework copied to clipboard