webi-installers
webi-installers copied to clipboard
[Fixed] Webinstall.dev was down for several minutes.
Problem
17 minutes of downtime today June 5th from 18:39 to 18:56 UTC.
Retrospective
What happened:
- There was a typo in the
Authorization
header, so authorization was not correctly sent - Production makes many requests, quickly reaching the rate limits (which cannot easily be mocked in testing)
- The error is being thrown in an async function, which caused server restart
- The server refreshes at least one random package on start, which caused failure on start
- Successive failures in rapid succession caused
systemctl
to abort relaunching
What to do about:
- Fix the typo.
This passed review without notice. It couldn't have been reasonably caught in testing. The typo was a valid word, so it wasn't caught by spellcheck either. 🤷♂️ As humans we make mistakes. - Reconsider the error handling.
Not sure if this category of error should cause this level of failure or not. The severity of the failure made it easy to identify and, since a user can't directly invoke this sort of failure remotely, it doesn't seems to present an attack vector. - More time between hotfixes and refactors.
"While we're here, might as well..." was the really root cause. It was not necessary to switch to usingfetch
(#852) in order to solve #850, #851. Even thought the commits were distinct, the process was not. If I had waited for a truly separate review on that change, the error would certainly have been more likely to be caught (i.e. review fatigue).
Status Updates
Not sure why yet. Investigating.
Possibly related to the change in fetching github releases and a difference between the staging and production environment.