reth icon indicating copy to clipboard operation
reth copied to clipboard

Stage sync metrics are not updated in blockchain tree

Open onbjerg opened this issue 2 years ago • 8 comments

Describe the bug

The blockchain tree updates the relevant tables to commit stage checkpoints, but the Prometheus metrics are not updated.

The metrics are defined here:

https://github.com/paradigmxyz/reth/blob/7ca8a297a861624baa778c401fa8d17f25291213/crates/stages/src/pipeline/sync_metrics.rs#L1-L26

And they are updated in the pipeline here:

  1. When the pipeline is initially loaded https://github.com/paradigmxyz/reth/blob/7ca8a297a861624baa778c401fa8d17f25291213/crates/stages/src/pipeline/mod.rs#L134-L145
  2. When the stage checkpoints https://github.com/paradigmxyz/reth/blob/7ca8a297a861624baa778c401fa8d17f25291213/crates/stages/src/pipeline/mod.rs#L278

The blockchain tree should update these metrics when it commits.

Alternative

Since these are sync metrics, and the pipeline emits events during sync, such an event emitter could also be added to the blockchain tree, and the sync controller should be in charge of updating these metrics when the relevant events are emitted.

Steps to reproduce

  1. Do a full sync, eventually control will be handed over from the pipeline to the blockchain tree
  2. Notice that the Prometheus metrics for stage checkpoints are not updated

Node logs

No response

Platform(s)

Linux (x86), Linux (ARM), Mac (Intel), Mac (Apple Silicon), Windows (x86), Windows (ARM)

What version/commit are you on?

ab55ea5e04292f21a60a83eeed5b8542f9c55d09

Code of Conduct

  • [X] I agree to follow the Code of Conduct

onbjerg avatar May 08 '23 13:05 onbjerg

Is this issue fixed? If not, I'd like to fix it :) @onbjerg

int88 avatar Jun 05 '23 03:06 int88

Unsure, cc @shekhirin / @rkrasiuk since they are currently working on new checkpoints

onbjerg avatar Jun 05 '23 11:06 onbjerg

It's not fixed, feel free to take it! We plan to merge https://github.com/paradigmxyz/reth/pull/2982 today, and it'll probably cause some conflicts with your changes, but they should be resolvable pretty easily.

shekhirin avatar Jun 05 '23 11:06 shekhirin

According to my research, for head sync stage, the metric is not updated because the checkpoint.block_number will not update if stage is not done.

https://github.com/paradigmxyz/reth/blob/1075995efc2844a2cf14ac241cac8ca2d1c362f7/crates/stages/src/stages/headers.rs#L294-L312

we are on the same page for this issue? @onbjerg

int88 avatar Jun 08 '23 09:06 int88

No, it is not related to the stages themselves, it is related to this crate which performs the same functionality as the stages during live sync, but the metrics for those stages are not updated.

The basic outline is:

  • For historical sync, the stages are used crates/stages. The sync metrics for these stages are updated by Pipeline in that crate, so that works fine
  • For live sync (that is, when you have all of the historical blocks, and you are syncing new blocks that were just created), the blockchain-tree crate performs the sync. However, the sync metrics are not updated when they should be.

onbjerg avatar Jun 08 '23 10:06 onbjerg

@onbjerg Could you please list these metrics? :)

int88 avatar Jun 08 '23 11:06 int88

What confused me is stages only exist in pipeline, why should we update the stage metrics in live sync when pipeline is finished? @onbjerg

int88 avatar Jun 19 '23 07:06 int88

Apologies for the delayed response, I have not been good at checking GitHub notifications.

Essentially we want to update the stage metrics since those are used to report progress to the user in metrics served over HTTP (used in e.g. the Grafana dashboard).

We already update the checkpoints themselves in the database, since they are used to figure out where we have synced to and what to do next when the node is restarted. However, the metrics themselves are not updated.

The pipeline updates the metrics in a few places in its lifetime:

  1. Upon start it loads the current checkpoints and reports those metrics (any call to register_metrics): https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/stages/src/pipeline/mod.rs#LL160C25-L160C25
  2. When a stage commits a checkpoint (either during execution or unwind), it also updates the progress for that stage, which is any call to self.metrics.stage_checkpoint, e.g.: https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/stages/src/pipeline/mod.rs#L374

The blockchain tree on the other hand just uses this function: https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/storage/provider/src/providers/database/provider.rs#L977

Which is called in two places, which correspond to finalizations and unwinds in the tree:

  • Unwinds: https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/storage/provider/src/providers/database/provider.rs#L722
  • Finalizations: https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/storage/provider/src/providers/database/provider.rs#L1247

Any place these are called should also update the appropriate metrics. The behavior can just mirror Pipeline::register_metrics since the tree always advances/rolls back all stages at once.

onbjerg avatar Jun 19 '23 11:06 onbjerg