reth
reth copied to clipboard
Stage sync metrics are not updated in blockchain tree
Describe the bug
The blockchain tree updates the relevant tables to commit stage checkpoints, but the Prometheus metrics are not updated.
The metrics are defined here:
https://github.com/paradigmxyz/reth/blob/7ca8a297a861624baa778c401fa8d17f25291213/crates/stages/src/pipeline/sync_metrics.rs#L1-L26
And they are updated in the pipeline here:
- When the pipeline is initially loaded https://github.com/paradigmxyz/reth/blob/7ca8a297a861624baa778c401fa8d17f25291213/crates/stages/src/pipeline/mod.rs#L134-L145
- When the stage checkpoints https://github.com/paradigmxyz/reth/blob/7ca8a297a861624baa778c401fa8d17f25291213/crates/stages/src/pipeline/mod.rs#L278
The blockchain tree should update these metrics when it commits.
Alternative
Since these are sync metrics, and the pipeline emits events during sync, such an event emitter could also be added to the blockchain tree, and the sync controller should be in charge of updating these metrics when the relevant events are emitted.
Steps to reproduce
- Do a full sync, eventually control will be handed over from the pipeline to the blockchain tree
- Notice that the Prometheus metrics for stage checkpoints are not updated
Node logs
No response
Platform(s)
Linux (x86), Linux (ARM), Mac (Intel), Mac (Apple Silicon), Windows (x86), Windows (ARM)
What version/commit are you on?
ab55ea5e04292f21a60a83eeed5b8542f9c55d09
Code of Conduct
- [X] I agree to follow the Code of Conduct
Is this issue fixed? If not, I'd like to fix it :) @onbjerg
Unsure, cc @shekhirin / @rkrasiuk since they are currently working on new checkpoints
It's not fixed, feel free to take it! We plan to merge https://github.com/paradigmxyz/reth/pull/2982 today, and it'll probably cause some conflicts with your changes, but they should be resolvable pretty easily.
According to my research, for head sync stage, the metric is not updated because the checkpoint.block_number will not update if stage is not done.
https://github.com/paradigmxyz/reth/blob/1075995efc2844a2cf14ac241cac8ca2d1c362f7/crates/stages/src/stages/headers.rs#L294-L312
we are on the same page for this issue? @onbjerg
No, it is not related to the stages themselves, it is related to this crate which performs the same functionality as the stages during live sync, but the metrics for those stages are not updated.
The basic outline is:
- For historical sync, the stages are used
crates/stages. The sync metrics for these stages are updated byPipelinein that crate, so that works fine - For live sync (that is, when you have all of the historical blocks, and you are syncing new blocks that were just created), the
blockchain-treecrate performs the sync. However, the sync metrics are not updated when they should be.
@onbjerg Could you please list these metrics? :)
What confused me is stages only exist in pipeline, why should we update the stage metrics in live sync when pipeline is finished? @onbjerg
Apologies for the delayed response, I have not been good at checking GitHub notifications.
Essentially we want to update the stage metrics since those are used to report progress to the user in metrics served over HTTP (used in e.g. the Grafana dashboard).
We already update the checkpoints themselves in the database, since they are used to figure out where we have synced to and what to do next when the node is restarted. However, the metrics themselves are not updated.
The pipeline updates the metrics in a few places in its lifetime:
- Upon start it loads the current checkpoints and reports those metrics (any call to
register_metrics): https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/stages/src/pipeline/mod.rs#LL160C25-L160C25 - When a stage commits a checkpoint (either during execution or unwind), it also updates the progress for that stage, which is any call to
self.metrics.stage_checkpoint, e.g.: https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/stages/src/pipeline/mod.rs#L374
The blockchain tree on the other hand just uses this function: https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/storage/provider/src/providers/database/provider.rs#L977
Which is called in two places, which correspond to finalizations and unwinds in the tree:
- Unwinds: https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/storage/provider/src/providers/database/provider.rs#L722
- Finalizations: https://github.com/paradigmxyz/reth/blob/13dcfb8e6e637c80cb4360398ba9d59d4d82fb35/crates/storage/provider/src/providers/database/provider.rs#L1247
Any place these are called should also update the appropriate metrics. The behavior can just mirror Pipeline::register_metrics since the tree always advances/rolls back all stages at once.