greptimedb
greptimedb copied to clipboard
Limit CPU and memory comsumption in resource-limited environment
What problem does the new feature solve?
GreptimeDB is designed to scale from even embedded devices to mega scale cloud services. But when it runs on resource-limited devices, like industrial controller based on.Android and Windows, it does not have a framework to limit the resource consumption, namely CPU and memory usage.
What does the feature do?
This issue calls for a resource-limit framework, just like cgroup in Linux kernel, to limit the CPU and memory usage for those dedicated spawned tasks, like flush, compaction, etc.
Implementation challenges
Tokio does not provide instrumentation tools to probe the CPU and memory usage of submitted tasks, we can only wrap the tasks with our own metrics and using rate limiting strategies to limit inflight tasks.
This reminds me a discussion with @zyy17 about an general abstraction of task spawning:
- an implementation independent spawn function as entrypoint
- option to spawn to current thread, to thread pool, to local subprocess or remote process (managed by various resource manager, like k8s)
- optional option to sepecify compute resources for the task
I have applied for OSPP 2024 with this sub project. Here's the reference: https://summer-ospp.ac.cn/org/prodetail/2432c0077?lang=en&list=pro Here's the real-time records: https://fvd360f8oos.feishu.cn/docx/QTb0d75gpoCoHGxL6Q3cx9KBnig
Task Goals and Technical Solutions
I. Task Basic Goals
-
Encapsulate and modify common/runtime, achieve task priority, and submit a set of runtime libraries
- Modify the
common/runtime
in Greptime (which is a layer of encapsulation oftokio
and has three thread pools to handle different tasks) to achieve the function of task priority.
- Modify the
-
Implement corresponding integrity and performance tests, do a good job in corresponding analysis, records, and source code that can be quickly reproduced
- Ensure comprehensive testing of the modified
runtime
library, including functional integrity testing and performance testing. At the same time, detailed analysis and records should be made to be able to quickly reproduce the testing process and results.
- Ensure comprehensive testing of the modified
-
Try some further ideas (such as resource limitations, scheduling algorithms) and do a good job in corresponding summaries, records, and source code that can be quickly reproduced
- Explore further optimization directions such as resource limitations and scheduling algorithms, and summarize and record the process and results of the attempt, and retain the source code that can be quickly reproduced.
II. Technical Solutions
(I) Selection of Pending Timing
-
Probability calculation mode
- Calculate directly with probability (it is more flexible to set, but there may be too many consecutive pendings).
-
Fixed interval mode
- Priority 5, probability 0.5: 1 0 1 0
- Priority 4, probability 0.66: 11 0 11 0
- Priority 3, probability 0.75: 111 0 111 0
- Priority 2, probability 0.9 (10/11): 1111111111 0 1111111111 0
- Normal (no special priority): 11111111111111
-
Mode based on time accumulation and division
- Record the accumulated time
adding_up_time
andpend_t
. Divide the entire running process withpend_t
, and the corresponding division point is the timing ofpend
. - When scheduling each time, see how many
pend_t
are contained inadding_up_time
, andpend
as many times (or onlypend
once). Then take the remainder ofadding_up_time
. This idea can more accurately control the frequency of triggeringpend
(how many timespend
is triggered per unit time), and the period ispend_t
.
- Record the accumulated time
(II) Pend Recovery Mechanism
-
Prevent tasks from not being scheduled again
- Returning
pend
directly may cause the awakened task to never be scheduled again in the future. - The idea adopted is to register the
waker
again before returningpend
andwake
again later (add to the scheduling queue). - The specific approach is to use a separate delayed wake-up task for wake-up.
-
tokio::task::yield_now()
offered an example
- Returning
(III) Control Closed-Loop CPU and Dynamically Adjust the Pending Probability
-
Core idea
- When the CPU usage of the thread pool exceeds the set threshold, the deviation value is reflected as the
pending
probability sensitivity coefficient.
- When the CPU usage of the thread pool exceeds the set threshold, the deviation value is reflected as the
-
Observation of computational amount
-
https://docs.rs/tokio-metrics/latest/tokio_metrics/struct.TaskMetrics.html
: This library has relatively complete observation of the delay related totask
. -
https://docs.rs/sysinfo/latest/sysinfo/
: This library has the ability to observe resources across platforms.
-
@ActivePeter we can start with introducing a new wrapper runtime that can set different priorities for different tasks even if it's not yet referenced in the code base. Feel free to draft a PR once you're ready.