cardano-node icon indicating copy to clipboard operation
cardano-node copied to clipboard

[BUG] - Mainnet epoch 528: Missed block at slot 143012062 with error TraceBlockFromFuture + Missed block at slot 143094566 with no information in node log.

Open asnakep opened this issue 11 months ago • 6 comments

Cardano Mainnet Blocks Production

Mainnet epoch 528: Missed block at slot 143012062 with error TraceBlockFromFuture + Missed block at slot 143094566 with no information in BP node log.

Good morning, IntersectMBO Team,

We encountered two missed blocks during our epoch 528 schedule. After thorough investigation, we ruled out the most common causes:

  1. Server time synchronization issue: Chrony NTP reported no errors or issues.
  2. High CPU usage: CPU utilization was only ~6% during the error occurrence.
  3. Other potential system errors: No anomalies were found in dmesg or syslog.

1. First missed block: Slot 143012062 - Timestamp: 2024-12-19 03:19:13 UTC - Error: TraceBlockFromFuture

In the attached BP node log (see linked JSON file and screenshots), we observe the following sequence:

TraceStartLeadershipCheck for slot 143012061 is logged. Subsequent entries detail block fetch and adoption events for slot 143012063. Finally, TraceStartLeadershipCheck for slot 143012062 appears but is immediately followed by the error TraceBlockFromFuture.

Added comments in attached log: "TraceStartLeadershipCheck for slot: 143012061" "All entries about block fetch from slot 143012063" "Missed block at slot 143012062. Note: Leadership check is logged 1.69 seconds in the past at 03:19:13.000Z, while the previous entry is from 03:19:14.690Z." "Error: TraceBlockFromFuture for slot 143012062"

2. Second missed block: Slot 143094566 - Timestamp: 2024-12-20 02:14:17 UTC - Error: No information in logs

Unlike the first missed block, this one lacks any direct log entries. However, the logs do not show TraceStartLeadershipCheck for the affected slot (143094566), even though such checks are present for all surrounding slots.

  • Screenshot with highlighted parts from BP log image
  • Pool: pool1xs34q2z06a46nk7hl48d27dj5gzc6hh9trugw2ehs9ajsevqffx
  • OS Name: Ubuntu
  • OS Version: 24.04.1 LTS
  • Node version: cardano-node 10.1.3 - linux-x86_64 - ghc-8.10 git rev 36871ba0cd3e86a5dbcfd6878cdb7168bb4e56a1
  • CLI version: cardano-cli 10.1.1.0 - linux-x86_64 - ghc-8.10 git rev 01bda2e2cb0a70cd95067d696dbb44665f1d680a

Thank you, kind regards, Manuel

asnakep avatar Dec 20 '24 14:12 asnakep

Thanks for the great report!

One potential reason for these could be a concurrent garbage collection. Do you see a TookSnapshot event nearby?

amesgen avatar Jan 06 '25 11:01 amesgen

Hello, IntersectMBO Team,

I just checked the BP log, and found a TookSnapshot event approximately 4 seconds before encountering the TraceBlockFromFuture event.

{"app":[],"at":"2024-12-19T03:19:10.456Z","data":{"enclosedTime":{"tag":"RisingEdge"},"kind":"TraceSnapshotEvent.TookSnapshot","snapshot":{"kind":"snapshot"},"tip":"RealPoint (SlotNo 142969741) 54c7a620b6e5c87b5cbe099a0f836f19debe82ebd6dedf0cf6cfd7d78b0339e9"},"env":"10.1.3:36871","host":"snake-co","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"926","sev":"Info","thread":"316"}

{"app":[],"at":"2024-12-19T03:19:14.708Z","data":{"credentials":"Cardano","val":{"current slot":143012062,"kind":"TraceBlockFromFuture","tip":143012063}},"env":"10.1.3:36871","host":"snake-co","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"926","sev":"Error","thread":"334"}

Thank you, cheers, Manuel

asnakep avatar Jan 07 '25 11:01 asnakep

Thanks, that then seems quite likely as the reason for the first missed slot. Reducing the performance impact of ledger snapshots is tracked in https://github.com/IntersectMBO/ouroboros-consensus/issues/868; and things should improve fairly soon with the integration of Ledger's MemPack (https://github.com/IntersectMBO/cardano-ledger/pull/4811).

amesgen avatar Jan 07 '25 11:01 amesgen

Cool thank you, so lets close this one

cheers, Manuel

asnakep avatar Jan 07 '25 12:01 asnakep

Good Morning,

I noticed that this issue 6058 is mentioned in https://github.com/IntersectMBO/ouroboros-consensus/issues/868

I reopen it, but feel free to close it if reopen is not needed,

kind regards, Manuel

asnakep avatar Jan 08 '25 10:01 asnakep

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

github-actions[bot] avatar Feb 08 '25 01:02 github-actions[bot]

This issue was closed because it has been stalled for 120 days with no activity. Remove stale label or comment or this will be closed in 60 days.

github-actions[bot] avatar Jun 10 '25 02:06 github-actions[bot]