celo-blockchain
celo-blockchain copied to clipboard
Review Forno dashboard in grafana
Description
Whilst investigating a recent forno incident I was making use of the forno dashboard in graphana and there were some things that were missing that would be useful and also some things that were confusing to me.
Things to add:
- Block gas used
- Block gas price
- Block time
We do currently have some metrics that seem to cover these things, such as consensus_istanbul_blocks_gasused
but since they are working as gauges in prometheus, these metrics produce a time series rather than tying the gas used to the block number. This means these metrics cannot easily provide us with a canonical gas used in block x chart. Because this metric is triggered during syncing and because of the lack of time synchronisation between nodes and clusters, there is no correlation between the values from different nodes, so this metric doesn't let you understand anything when viewing it at the cluster level, using guages for gas price and block time would also produce data that is not useful.
To remedy this we need to be able to plot the above values against the block number and I don't know how to do this with grafana or even if it can be done.
Things that were confusing/difficult:
- I was unaware of the variables in the forno dashboard, I assumed that I was viewing data for all clusters when in fact I was only viewing data for one cluster.
- After discovering the variables I found that I could select clusters that made no sense for forno, resulting in many empty graphs.
- Since you can't see all the cluster information at once it is hard to get an overview of how different clusters are performing, and even harder to spot inconsistent behaviour between clusters, this is exacerbated by the need to scroll to the top of the dashboard to change a variable.
Lets review, this dashboard and see what we would like to change.