icinga2
icinga2 copied to clipboard
Disallow config modifications via API during reload
Once the new main process has read the config, it misses subsequent modifications from the old process otherwise.
fixes #9365
Tests
Disable checker. Add enough hosts so that committing items on reload takes long enough (-x debug) for you to trigger a 503 (for an otherwise 200 one):
{"results":[{"code":503,"errors":["Icinga is reloading"],"status":"Object could not be created."}]}
Once the new main process has read the config
Not only once it's done doing that, but as soon as it started reading the config, it may miss new files that (dis)appear due to changes by the old worker. But the implementation looks like it would block modifications starting before the new worker process is started, so that seems to be fine.
Disable checker.
Why is this relevant/necessary here?
Is boost::shared_mutex
guaranteed to be safe to be used over shared memory?
Also, to me this feels like similar to some already existing functionality: after all, this is used to communicate some state between the processes, like it's already done with the SIGUSR2
based signaling. So it would be nice to have a common mechanism for all of this.
And there's also some general issue remaining: you only do this for the /v1/objects
API, but there are more things writing config: some /v1/actions
(add/remove comment/downtime) as well as internal functionality like scheduled downtimes creating the actual downtime object or expiry of comments/downtimes.
Why is this relevant/necessary here?
Not to blow up your WS.
Is
boost::shared_mutex
guaranteed to be safe to be used over shared memory?
It's guaranteed to synchronise threads. Under Linux a thread is just a process sharing its whole memory with other processes. So: yes.
Also, to me this feels like similar to some already existing functionality: after all, this is used to communicate some state between the processes, like it's already done with the
SIGUSR2
based signaling. So it would be nice to have a common mechanism for all of this.
As a template Ipc
should be generic enough for being the common mechanism for not yet merged stuff. (Btw. shall I simplify #8429 down to Ipc<std::atomic<double>>
?)
And there's also some general issue remaining: you only do this for the
/v1/objects
API, but there are more things writing config: some/v1/actions
(add/remove comment/downtime) as well as internal functionality like scheduled downtimes creating the actual downtime object or expiry of comments/downtimes.
Let’s get back to this once we’ve agreed upon the implementation in general.
Is
boost::shared_mutex
guaranteed to be safe to be used over shared memory?It's guaranteed to synchronise threads. Under Linux a thread is just a process sharing its whole memory with other processes. So: yes.
It's not that simple. For example, if it uses a pthread mutex internally, these have to be configured specifically to work over shared memory, see pthread_mutexattr_setpshared
.
OK, shall I just use that type instead?
Wait!
Mutexes created with this attributes object are to be shared only among threads in the same process that initialized the mutex. This is the default value for the process-shared mutex attribute.
Why my test succeeded then?
Why my test succeeded then?
Does boost::shared_mutex
actually use pthread on your system? And even then, things might be implemented slightly different between systems and versions and don't have to look obviously broken either. I just brought that up as an example that things aren't as simple as just putting stuff in shared memory.
OK, shall I just use that type instead?
Based on the documentation, that sound more suitable for that particular use-case.
internal functionality like scheduled downtimes creating the actual downtime object or expiry of comments/downtimes
Fortunately we won’t lose anything here.
internal functionality like scheduled downtimes creating the actual downtime object or expiry of comments/downtimes
Fortunately we won’t lose anything here.
I think this could possibly lead to duplicate notifications/history though. But probably still better than continuing checks but not ending downtimes for example (that would probably be the alternative).
I'd rather try to use a common mechanism for all signaling between umbrella and workers.
We should first go back a step and think about how we want things to work in the end
No, I don’t think so. The concept behind this PR is just fine. At the same time we don’t have (any reason) to wake sleeping dogs by touching the already existing working and well tested IPC mechanisms such as signals or pipe FD passing between main process and spawner. As well as we didn’t have (any reason) to re-write any var type to auto along with writing the very first one auto var.
This is already done for deploys:
https://github.com/Icinga/icinga2/blob/987bb22397b1a16f583e77d5d84b70fcf283fa8f/lib/remote/configstageshandler.cpp#L134-L137
No reason not to do it here as well. Oh, even better: there is a reason to do it – the one why #9267 was done. And backported.
Oh, and what about Windows? I think the exclusive lock shouldn't exist there at all (it's only used in the non-Windows reload code) and the shared lock should be a no-op (if there can't be an exclusive lock, you can basically have infinite shared ones).