ipfs-gui
ipfs-gui copied to clipboard
IPFS Infrastructure Status Page
Related to #80 we need a more holistic overview of the health of the ipfs.io infrastucture. We want to visualise how things are running in a way that give a clear overview at the top level, and lets you drill into more info for each specific service and linking out to other telemetry services (netdata, grafana) where sensible to give the full details.
A status page of some sort has been suggested... popular public ones include
- https://status.slack.com/
- https://www.githubstatus.com/
- https://status.circleci.com/
- https://status.cloud.google.com/
Some open source solutions
- https://statusfy.co/
TODO:
- [ ] Define the list of services
- gateway nodes
- bootstrap nodes
- dhtbooster nodes
- preload nodes
- websocket-star and webrtc signalling infra
- nginx / http frontend
- certbot / tls
- DNS / dnsimple
- ?
- [ ] Define regions, zones, datacenters
- packet
- where tho?
- [ ] Define metrics
- Gateway / Nginx requests over time (current gateway load)
- nginx timeouts over time (# requests for undiscoverable content)
- IPFS response time for local blocks
- IPFS response time for blocks from cluster
- IPFS response time for DHT discovery
- Estimated unique peerIDs in network
- total bandwidth and average bandwidth per request.
- total infra cost?
- [ ] Define status thresholds
- Happy fail: requests are slow because we are getting way more than usual
- Sad fail: requests are slow becuase something is broken... DHT discovery time just spiked, but number of unique peers didn't
- Budget exceeded: we hit a cost threshold and started throttling specific services.