Add better IO metrics

Open willhickey opened this issue 2 years ago • 2 comments

Problem

Our disk metrics lack granularity, making it hard to diagnose IO bottlenecks

Proposed Solution

Add IOPS and bandwidth metrics for each of the configurable paths:

ledger
snapshots
incremental snapshots
accounts
accounts index
accounts hash cache
logs

The underlying system data will be at the volume level, so the data will be duplicated in most configurations. This granularity will give us the ability to narrow down the source of IO activity and will also provide insight into how many volumes each node has and which directories are grouped together on the same volume.

Jan 11 '24 18:01 willhickey

An additional configurable path is the accounts hash cache too. Can that be added to the list here?

Jan 11 '24 18:01 brooksprumo

An additional configurable path is the accounts hash cache too. Can that be added to the list here?

Added

Jan 11 '24 18:01 willhickey