ipfs-gui
ipfs-gui copied to clipboard
IPFS Infrastructure Status Page
Related to #80 we need a more holistic overview of the health of the ipfs.io infrastucture. We want to visualise how things are running in a way that give a clear overview at the top level, and lets you drill into more info for each specific service and linking out to other telemetry services (netdata, grafana) where sensible to give the full details.
A status page of some sort has been suggested... popular public ones include
- https://status.slack.com/
- https://www.githubstatus.com/
- https://status.circleci.com/
- https://status.cloud.google.com/
Some open source solutions
- https://statusfy.co/
TODO:
- [ ] Define the list of services
- gateway nodes
- bootstrap nodes
- dhtbooster nodes
- preload nodes
- websocket-star and webrtc signalling infra
- nginx / http frontend
- certbot / tls
- DNS / dnsimple
- ?
- [ ] Define regions, zones, datacenters
- packet
- where tho?
- [ ] Define metrics
- Gateway / Nginx requests over time (current gateway load)
- nginx timeouts over time (# requests for undiscoverable content)
- IPFS response time for local blocks
- IPFS response time for blocks from cluster
- IPFS response time for DHT discovery
- Estimated unique peerIDs in network
- total bandwidth and average bandwidth per request.
- total infra cost?
- [ ] Define status thresholds
- Happy fail: requests are slow because we are getting way more than usual
- Sad fail: requests are slow becuase something is broken... DHT discovery time just spiked, but number of unique peers didn't
- Budget exceeded: we hit a cost threshold and started throttling specific services.
Some good status pages
https://www.githubstatus.com
https://status.circleci.com
https://status.slack.com
Interestingly github and circle both use https://statuspage.io I am currently trying out https://docs.statusfy.co
Here's how things could look if we go for https://statuspage.io
User view
| New Incident | Resolved | Details |
|---|---|---|
![]() |
![]() |
![]() |
Operator view
| New incident | Resolved | Details |
|---|---|---|
![]() |
![]() |
![]() |
All OK

I really want those health meters to pulse in a Knight Rider kind of way, but otherwise this is really nifty!
I also tired out:
- https://www.sorryapp.com/ - cheaper than statuspage.io but didn't feel as intuitive... something about it didn't click for me
| New Incident | View incident |
|---|---|
![]() |
![]() |

- https://statusfy.co/ - good self-hosted option - gives you a cli to create incidents as mardown files. The incident status is tracked in yaml front matter, and the markdown lets you add notes and status updates. see: https://docs.statusfy.co/guide/incidents/#front-matter
The cli builds out a static site, and then it's up to us where we want to publish it.

This could let us host it on IPFS, but I'm assuming that the network status page is the one resource we should not post on IPFS itself. We can of course host it on any static resoruce server. I've not explored it further as it seems like we'd want to have a very comfortable and clear UI for reporting incidents, as those situations are stressful enough. Creating a static site is a reliable process, an could be entirely automated via github, but I want to check in with the operators who are using it to see what there prefences are.
This could let us host it on IPFS, but I'm assuming that the network status page is the one resource we should not post on IPFS itself.
😆 Agreed.
Main storage Cluster is missing from the services list (although I see it in your screenshots).
Also, Pinbots.
both statuspage and statusfy seem reasonable. are there other benefits to the self hosted version we like? ex the markdown or cli integrations?







