sys_reading icon indicating copy to clipboard operation
sys_reading copied to clipboard

Lachesis: A Middleware for Customizing OS Scheduling of Stream Processing Queries

Open pentium3 opened this issue 1 year ago • 2 comments

http://pages.di.unipi.it/mencagli/publications/preprint-middleware-2021.pdf

https://github.com/dmpalyvos/lachesis

https://www.youtube.com/watch?v=YPMhcfSzG6A

pentium3 avatar Jun 30 '23 07:06 pentium3

Here is something you may find interesting:

  • Lachesis Evaluation: https://github.com/dmpalyvos/lachesis-evaluation
  • Lachesis 2.0, which consists of two improvements.
    • Paper: Accelerating Stream Processing Queries with Congestion-aware Scheduling and Real-time Linux Threads.
      • https://dl.acm.org/doi/abs/10.1145/3587135.3592202
    • Code:
      • https://github.com/FauFra/lachesis-mod
      • https://github.com/ParaGroup/Lachesis-RT
  • Master thesis by Fausto Francesco Frasca: Evaluating Linux Kernel Mechanisms for Scheduling Streaming Queries on Multicores
    • Very detailed.
    • Fausto Francesco Frasca is the first author of Lachesis 2.0.
    • The best way of understanding Lachesis and Lachesis 2.0.

Sunt-ing avatar Jul 05 '23 07:07 Sunt-ing

summary

key problem

workload

single/multi streaming query.

optimization goal

xxxxx

configurations to tune

user define a scheduling policy (basically set priority for each operator thread according to some rules) through their API

use OS mechanisms (cgroup, nice) to enforce user-defined scheduling policies.

scenario

xxxxx

technique

xxxxx

dynamic workload?

xxxxx

multi-tenant?

xxxxx

implementation

xxxxx

Problem and motivation

what is the problem this paper is solving?
why is it important?
why is it challenging?

Main ideas and insights

describe the paper gist in 1-2 sentences
what is important to remember? What did we learn?

Solution description

explain how the solution work

example of scheduling policies: [ch5.1]

  • (1) Queue Size (QS) [18] prioritizes operators with more input tuples in their queues, balancing such queues’ size and, in turn, the operators’ effective utilization, to achieve higher throughput at the Egresses and to lower latency.
  • (2) Highest Rate (HR) [50] prioritizes “operator paths” (branches of a DAG ending at a sink) that are both productive (i.e., with operators having high selectivity) and inexpensive (low cost) with the goal of minimizing the average processing latency of all the tuples in the system.
  • (3) First-Come-First-Serve (FCFS) [7] prioritizes those operators whose input tuples have spent more time in the system, with the goal of minimizing the maximum latency.
  • (4) RANDOM gives operators uniformly random priorities.

Important results

describe the experimental setup
summarize the main results

Limitations and opportunities for improvement

when doesn't it work?
what assumptions does the paper make and when are they valid?

Closely related work

list of main competitors and how they differ

Follow-up research ideas (Optional)

If you were to base your next research project on this paper, what would you do?
Propose concrete ways to achieve one or more of the following:

Build a better (faster, more efficient, more user-friendly...) system to solve the same problem
Solve a generalization of the problem
Address one of the work's limitations
Solve the same problem in a different context
Solve the problem in a much larger scale
Apply the paper's methods to a different (but similar) problem
Solve a new problem created by this work

pentium3 avatar Mar 21 '24 04:03 pentium3