espresso-sequencer icon indicating copy to clipboard operation
espresso-sequencer copied to clipboard

feature/espresso-alerting

Open zebu-ram-panda opened this issue 6 months ago • 3 comments

Add Prometheus alerts and Grafana dashboard for Sequencer monitoring

This contribution adds a foundational monitoring setup for the Espresso Sequencer, enabling operators to observe its health and performance. Specifically, it includes:

  1. Prometheus Alerting Rules (monitoring/prometheus/rules/sequencer-alerts.yml):

    • A predefined set of alerts covering critical aspects of the sequencer such as consensus progress, leader health, P2P connectivity (Libp2p, CDN), L1 head advancement, transaction processing, and software versioning.
    • These rules are designed with placeholders for environment-specific details and have had internal-only references (like specific runbook URLs) removed, making them suitable for wider use.
  2. Grafana Dashboard (monitoring/grafana/dashboards/sequencerDashboard.json):

    • A pre-configured dashboard to visualize key performance indicators (KPIs) from the sequencer. This includes views of consensus state, transaction throughput, peer connections, and leader-specific metrics.
    • Crucially, the dashboard is now configured with a template variable, allowing users to dynamically select their Prometheus datasource. This makes the dashboard more flexible and easier to integrate into various environments without manual JSON editing for datasource UIDs.

Together, these components provide a robust starting point for monitoring an Espresso Sequencer deployment, enhancing operational visibility and aiding in troubleshooting.

zebu-ram-panda avatar May 09 '25 19:05 zebu-ram-panda

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: rob-maron
:x: nstankov-stkd
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar May 09 '25 19:05 CLAassistant

Sorry I'm not sure what to do with this. Is this something you are using?

sveitser avatar May 12 '25 15:05 sveitser

Thanks for the submission! I took a look and the alerts/dashboard look good code-wise, but I've yet to take a look at them through the Prom/Grafana lenses. In the meantime, since we don't use these tools internally, would you be able to include a short README on usage?

rob-maron avatar May 16 '25 17:05 rob-maron