Add API liveness probes
This is a very loose RFC that needs more testing, but first and foremost I would like to confirm that I am at least in the vaguely correct wheelhouse before proceeding further.
Video demo: https://drive.google.com/file/d/1BWlrohNgfKWDLDEENDunhsNnl6W4sDmY/view?usp=share_link (too large to live on GitHub).
Add liveness probes to API
As described in https://github.com/ppy/osu/issues/35736.
Some ground rules:
-
I'm not going to attempt to decide where the probe should be hosted, this PR is mostly agnostic to that anyway.
-
When API is in online state, the probe is periodically queried every minute (interval up for debate).
- If the probe URL is not set, nothing happens.
- If the client can't reach the probe, nothing happens (or at least that's the intended design). This is such that client-local outages or inabilities to reach the probe don't knock the rest of the game offline even though everything can be fine with it.
- If any other deserialisation error or anything else happens, nothing happens.
- The only time where anything happens is when the probe is up and actively returns a result that shows that there is an active outage going on. In this case, API is instantly knocked to
Failingstate.
-
When API is in failing state, the probe is queried more actively (every 5 seconds as written, but this is also up for discussion).
- If the probe actively returns a result that shows that there continues to be an active outage going on, the client does not attempt to get back online.
- In any other circumstance (read: the probe being disabled, unreachable, or returning a result that indicates that there is no active outage), the client will try to get back online.
- The side benefit to this is that this could provide relief from retry traffic when we have known outages going on.
The reason this is designed this way is to be fail-safe. In particular, my concern here is that e.g. the probe being potentially unavailable to Russia users due to network blocks does not completely delete online functions for those users.
Skip attempting submission when API is in failing state
Speeds up gameplay loading when loading is more or less doomed to fail anyway.
Will also spam notifications about submission not happening. This could be probably made smarter if we want to (e.g. notify once until API comes back online), but I'm not super sure we want that, and maybe being super loud about submission failures is the preferred way to go instead.
General structure of this looks correct.
Stable also does this neat thing where when connectivity is unavailable, it shows a global overlay (broken chain link) in the bottom-right of the screen, making it very clear that things are offline. We already kinda do this with the avatar in the toolbar, so maybe that's enough to call a replacement.
I would say that the notification which appears as a result of the API going offline should have a custom style at very least (colour and icon) to make it stand out and look like a global broadcast.
I would say that the notification which appears as a result of the API going offline should have a custom style at very least (colour and icon) to make it stand out and look like a global broadcast.
Attempt was made in https://github.com/ppy/osu/pull/35752/commits/33398b9a711eb733807183a010dac42d029a0f68. See what you think.
https://github.com/user-attachments/assets/23a2247d-278c-4a88-b206-63bc0f0b45c4
Notably the score submission notifications become much more spammy because they are no longer debounced via the mechanism forwarding LogLevel.Information log entries into notifications.