icinga2
icinga2 copied to clipboard
CpuBoundWork#CpuBoundWork(): don't spin on atomic int to acquire slot
This is inefficient and implies possible bad surprises regarding waiting durations on busy nodes. Instead, use AsioConditionVariable#Wait() if there are no free slots. It's notified by others' CpuBoundWork#~CpuBoundWork() once finished.
fixes #9988
Also, the current implementation is a spin-lock. 🙈 https://github.com/Icinga/icinga2/pull/10117#issuecomment-2376219250
Low load test
[2024-02-07 12:16:44 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:51732 (no client certificate)
[2024-02-07 12:16:44 +0100] information/CpuBoundWork: Using one free slot, free: 12 => 11
[2024-02-07 12:16:44 +0100] information/CpuBoundWork: Releasing one used slot, free: 11 => 12
[2024-02-07 12:16:44 +0100] information/CpuBoundWork: Using one free slot, free: 12 => 11
[2024-02-07 12:16:44 +0100] information/CpuBoundWork: Releasing one used slot, free: 11 => 12
[2024-02-07 12:16:44 +0100] information/CpuBoundWork: Using one free slot, free: 12 => 11
[2024-02-07 12:16:44 +0100] information/CpuBoundWork: Releasing one used slot, free: 11 => 12
[2024-02-07 12:16:44 +0100] information/HttpServerConnection: Request: GET /v1/objects/services/3d722963ae43!4272 (from [::ffff:127.0.0.1]:51732), user: root, agent: curl/8.4.0, status: Not Found).
[2024-02-07 12:16:44 +0100] information/HttpServerConnection: HTTP client disconnected (from [::ffff:127.0.0.1]:51732)
[2024-02-07 12:16:44 +0100] information/CpuBoundWork: Using one free slot, free: 12 => 11
[2024-02-07 12:16:44 +0100] information/CpuBoundWork: Releasing one used slot, free: 11 => 12
Just a few increments/decrements. 👍
High load test
If I literally DoS Icinga with https://github.com/Al2Klimov/i2all.tf/tree/master/i2dos, I get a few of these:
[2024-02-07 12:19:37 +0100] information/CpuBoundWork: Handing over one used slot, free: 0 => 0
After I stop that program and fire one curl as in my low load test above, I get the same picture: still 12 free slots. 👍
Logs
--- lib/base/io-engine.cpp
+++ lib/base/io-engine.cpp
@@ -24,6 +24,7 @@ CpuBoundWork::CpuBoundWork(boost::asio::yield_context yc)
std::unique_lock<std::mutex> lock (sem.Mutex);
if (sem.FreeSlots) {
+ Log(LogInformation, "CpuBoundWork") << "Using one free slot, free: " << sem.FreeSlots << " => " << sem.FreeSlots - 1u;
--sem.FreeSlots;
return;
}
@@ -32,7 +33,9 @@ CpuBoundWork::CpuBoundWork(boost::asio::yield_context yc)
sem.Waiting.emplace(&cv);
lock.unlock();
+ Log(LogInformation, "CpuBoundWork") << "Waiting...";
cv.Wait(yc);
+ Log(LogInformation, "CpuBoundWork") << "Waited!";
}
void CpuBoundWork::Done()
@@ -42,8 +45,10 @@ void CpuBoundWork::Done()
std::unique_lock<std::mutex> lock (sem.Mutex);
if (sem.Waiting.empty()) {
+ Log(LogInformation, "CpuBoundWork") << "Releasing one used slot, free: " << sem.FreeSlots << " => " << sem.FreeSlots + 1u;
++sem.FreeSlots;
} else {
+ Log(LogInformation, "CpuBoundWork") << "Handing over one used slot, free: " << sem.FreeSlots << " => " << sem.FreeSlots;
sem.Waiting.front()->Set();
sem.Waiting.pop();
}
Is the way boost::asio::deadline_timer is used by AsioConditionVariable here actually safe? It's documentation says that shared objects are not thread-safe.