icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

Disallow config modifications via API during reload

Open Al2Klimov opened this issue 1 year ago • 10 comments

Once the new main process has read the config, it misses subsequent modifications from the old process otherwise.

fixes #9365

Al2Klimov avatar Jul 15 '22 15:07 Al2Klimov

Tests

Disable checker. Add enough hosts so that committing items on reload takes long enough (-x debug) for you to trigger a 503 (for an otherwise 200 one):

{"results":[{"code":503,"errors":["Icinga is reloading"],"status":"Object could not be created."}]}

Al2Klimov avatar Jul 15 '22 16:07 Al2Klimov

Once the new main process has read the config

Not only once it's done doing that, but as soon as it started reading the config, it may miss new files that (dis)appear due to changes by the old worker. But the implementation looks like it would block modifications starting before the new worker process is started, so that seems to be fine.

Disable checker.

Why is this relevant/necessary here?

Is boost::shared_mutex guaranteed to be safe to be used over shared memory?

Also, to me this feels like similar to some already existing functionality: after all, this is used to communicate some state between the processes, like it's already done with the SIGUSR2 based signaling. So it would be nice to have a common mechanism for all of this.

And there's also some general issue remaining: you only do this for the /v1/objects API, but there are more things writing config: some /v1/actions (add/remove comment/downtime) as well as internal functionality like scheduled downtimes creating the actual downtime object or expiry of comments/downtimes.

julianbrost avatar Jul 18 '22 07:07 julianbrost

Why is this relevant/necessary here?

Not to blow up your WS.

Is boost::shared_mutex guaranteed to be safe to be used over shared memory?

It's guaranteed to synchronise threads. Under Linux a thread is just a process sharing its whole memory with other processes. So: yes.

Also, to me this feels like similar to some already existing functionality: after all, this is used to communicate some state between the processes, like it's already done with the SIGUSR2 based signaling. So it would be nice to have a common mechanism for all of this.

As a template Ipc should be generic enough for being the common mechanism for not yet merged stuff. (Btw. shall I simplify #8429 down to Ipc<std::atomic<double>>?)

And there's also some general issue remaining: you only do this for the /v1/objects API, but there are more things writing config: some /v1/actions (add/remove comment/downtime) as well as internal functionality like scheduled downtimes creating the actual downtime object or expiry of comments/downtimes.

Let’s get back to this once we’ve agreed upon the implementation in general.

Al2Klimov avatar Jul 18 '22 08:07 Al2Klimov

Is boost::shared_mutex guaranteed to be safe to be used over shared memory?

It's guaranteed to synchronise threads. Under Linux a thread is just a process sharing its whole memory with other processes. So: yes.

It's not that simple. For example, if it uses a pthread mutex internally, these have to be configured specifically to work over shared memory, see pthread_mutexattr_setpshared.

julianbrost avatar Jul 18 '22 09:07 julianbrost

OK, shall I just use that type instead?

Al2Klimov avatar Jul 18 '22 09:07 Al2Klimov

Wait!

Mutexes created with this attributes object are to be shared only among threads in the same process that initialized the mutex. This is the default value for the process-shared mutex attribute.

Why my test succeeded then?

Al2Klimov avatar Jul 18 '22 09:07 Al2Klimov

Why my test succeeded then?

Does boost::shared_mutex actually use pthread on your system? And even then, things might be implemented slightly different between systems and versions and don't have to look obviously broken either. I just brought that up as an example that things aren't as simple as just putting stuff in shared memory.

OK, shall I just use that type instead?

Based on the documentation, that sound more suitable for that particular use-case.

julianbrost avatar Jul 18 '22 13:07 julianbrost

internal functionality like scheduled downtimes creating the actual downtime object or expiry of comments/downtimes

Fortunately we won’t lose anything here.

Al2Klimov avatar Jul 18 '22 14:07 Al2Klimov

internal functionality like scheduled downtimes creating the actual downtime object or expiry of comments/downtimes

Fortunately we won’t lose anything here.

I think this could possibly lead to duplicate notifications/history though. But probably still better than continuing checks but not ending downtimes for example (that would probably be the alternative).

julianbrost avatar Jul 18 '22 15:07 julianbrost

I'd rather try to use a common mechanism for all signaling between umbrella and workers.

We should first go back a step and think about how we want things to work in the end

No, I don’t think so. The concept behind this PR is just fine. At the same time we don’t have (any reason) to wake sleeping dogs by touching the already existing working and well tested IPC mechanisms such as signals or pipe FD passing between main process and spawner. As well as we didn’t have (any reason) to re-write any var type to auto along with writing the very first one auto var.

Al2Klimov avatar Jul 19 '22 16:07 Al2Klimov

This is already done for deploys:

https://github.com/Icinga/icinga2/blob/987bb22397b1a16f583e77d5d84b70fcf283fa8f/lib/remote/configstageshandler.cpp#L134-L137

No reason not to do it here as well. Oh, even better: there is a reason to do it – the one why #9267 was done. And backported.

Al2Klimov avatar Oct 21 '22 09:10 Al2Klimov

Oh, and what about Windows? I think the exclusive lock shouldn't exist there at all (it's only used in the non-Windows reload code) and the shared lock should be a no-op (if there can't be an exclusive lock, you can basically have infinite shared ones).

julianbrost avatar Mar 31 '23 11:03 julianbrost