cardano-node
cardano-node copied to clipboard
[BUG] - Mainnet epoch 528: Missed block at slot 143012062 with error TraceBlockFromFuture + Missed block at slot 143094566 with no information in node log.
Cardano Mainnet Blocks Production
Mainnet epoch 528: Missed block at slot 143012062 with error TraceBlockFromFuture + Missed block at slot 143094566 with no information in BP node log.
Good morning, IntersectMBO Team,
We encountered two missed blocks during our epoch 528 schedule. After thorough investigation, we ruled out the most common causes:
- Server time synchronization issue: Chrony NTP reported no errors or issues.
- High CPU usage: CPU utilization was only ~6% during the error occurrence.
- Other potential system errors: No anomalies were found in dmesg or syslog.
1. First missed block: Slot 143012062 - Timestamp: 2024-12-19 03:19:13 UTC - Error: TraceBlockFromFuture
In the attached BP node log (see linked JSON file and screenshots), we observe the following sequence:
TraceStartLeadershipCheck for slot 143012061 is logged. Subsequent entries detail block fetch and adoption events for slot 143012063. Finally, TraceStartLeadershipCheck for slot 143012062 appears but is immediately followed by the error TraceBlockFromFuture.
Added comments in attached log: "TraceStartLeadershipCheck for slot: 143012061" "All entries about block fetch from slot 143012063" "Missed block at slot 143012062. Note: Leadership check is logged 1.69 seconds in the past at 03:19:13.000Z, while the previous entry is from 03:19:14.690Z." "Error: TraceBlockFromFuture for slot 143012062"
-
Extraction from BP node log with comments added in json bpLogTraceBlockFromFuture.json
-
Screenshot with highlighted parts from BP log image
2. Second missed block: Slot 143094566 - Timestamp: 2024-12-20 02:14:17 UTC - Error: No information in logs
Unlike the first missed block, this one lacks any direct log entries. However, the logs do not show TraceStartLeadershipCheck for the affected slot (143094566), even though such checks are present for all surrounding slots.
- Screenshot with highlighted parts from BP log image
- Pool: pool1xs34q2z06a46nk7hl48d27dj5gzc6hh9trugw2ehs9ajsevqffx
- OS Name: Ubuntu
- OS Version: 24.04.1 LTS
- Node version: cardano-node 10.1.3 - linux-x86_64 - ghc-8.10 git rev 36871ba0cd3e86a5dbcfd6878cdb7168bb4e56a1
- CLI version: cardano-cli 10.1.1.0 - linux-x86_64 - ghc-8.10 git rev 01bda2e2cb0a70cd95067d696dbb44665f1d680a
Thank you, kind regards, Manuel
Thanks for the great report!
One potential reason for these could be a concurrent garbage collection. Do you see a TookSnapshot event nearby?
Hello, IntersectMBO Team,
I just checked the BP log, and found a TookSnapshot event approximately 4 seconds before encountering the TraceBlockFromFuture event.
{"app":[],"at":"2024-12-19T03:19:10.456Z","data":{"enclosedTime":{"tag":"RisingEdge"},"kind":"TraceSnapshotEvent.TookSnapshot","snapshot":{"kind":"snapshot"},"tip":"RealPoint (SlotNo 142969741) 54c7a620b6e5c87b5cbe099a0f836f19debe82ebd6dedf0cf6cfd7d78b0339e9"},"env":"10.1.3:36871","host":"snake-co","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"926","sev":"Info","thread":"316"}
{"app":[],"at":"2024-12-19T03:19:14.708Z","data":{"credentials":"Cardano","val":{"current slot":143012062,"kind":"TraceBlockFromFuture","tip":143012063}},"env":"10.1.3:36871","host":"snake-co","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"926","sev":"Error","thread":"334"}
Thank you, cheers, Manuel
Thanks, that then seems quite likely as the reason for the first missed slot. Reducing the performance impact of ledger snapshots is tracked in https://github.com/IntersectMBO/ouroboros-consensus/issues/868; and things should improve fairly soon with the integration of Ledger's MemPack (https://github.com/IntersectMBO/cardano-ledger/pull/4811).
Cool thank you, so lets close this one
cheers, Manuel
Good Morning,
I noticed that this issue 6058 is mentioned in https://github.com/IntersectMBO/ouroboros-consensus/issues/868
I reopen it, but feel free to close it if reopen is not needed,
kind regards, Manuel
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.
This issue was closed because it has been stalled for 120 days with no activity. Remove stale label or comment or this will be closed in 60 days.