erigon icon indicating copy to clipboard operation
erigon copied to clipboard

Erigon mainnet block height lagging behind OE node

Open mohoff opened this issue 2 years ago • 17 comments

System information

Erigon version: 2022.02.4-beta

Expected behaviour

Erigon not being behind in blocks

Actual behaviour

Our Erigon mainnet node is consistently behind in blocks in comparison to an OpenEthereum node. We're seeing diffs of up to 8 blocks, usually 4+ block diff a few times per hour, while we collect block heights every 1 min. Worth mentioning is that the Erigon is not mining and under light load while the OE one is under heavier load.

I believe 1-2 block difference every now and then is expected but 4-8 blocks a few times per hour seems high.

Is this a known bug that has been fixed in the meantime? Can you see by the logs that this must be networking related?

Example

Erigon node logs around block 14964637:

[INFO] [06-15|00:21:43.579] RPC Daemon notified of new headers       from=14964634 to=14964635
[INFO] [06-15|00:21:43.579] [1/16 Headers] Waiting for headers...    from=14964635
[INFO] [06-15|00:21:45.240] [1/16 Headers] Processed                 highest inserted=14964636 age=4s
[INFO] [06-15|00:21:45.518] [5/16 Execution] Completed on            block=14964636
[INFO] [06-15|00:21:48.331] Commit cycle                             in=1.316655503s
[INFO] [06-15|00:21:48.331] RPC Daemon notified of new headers       from=14964635 to=14964636
[INFO] [06-15|00:21:48.331] [1/16 Headers] Waiting for headers...    from=14964636
[INFO] [06-15|00:22:12.295] [1/16 Headers] Processed                 highest inserted=14964637 age=27s
[INFO] [06-15|00:22:14.217] [5/16 Execution] Completed on            block=14964637
[INFO] [06-15|00:22:26.141] Commit cycle                             in=5.976834537s
[INFO] [06-15|00:22:26.141] RPC Daemon notified of new headers       from=14964636 to=14964637
[INFO] [06-15|00:22:26.141] [1/16 Headers] Waiting for headers...    from=14964637
[INFO] [06-15|00:22:26.158] [1/16 Headers] Processed                 highest inserted=14964641 age=2s
[INFO] [06-15|00:22:27.132] [5/16 Execution] Completed on            block=14964641
[INFO] [06-15|00:22:28.932] [p2p] GoodPeers                          eth66=33
[INFO] [06-15|00:22:35.658] [txpool] stat                            block=14964637 pending=6020 baseFee=30000 queued=30000 alloc_mb=1671 sys_mb=5424
[INFO] [06-15|00:22:37.074] Commit cycle                             in=5.261498393s
[INFO] [06-15|00:22:37.074] RPC Daemon notified of new headers       from=14964637 to=14964641
[INFO] [06-15|00:22:37.074] [1/16 Headers] Waiting for headers...    from=14964641
[INFO] [06-15|00:22:40.295] [1/16 Headers] Processed                 highest inserted=14964642 age=15s

OpenEthereum logs during the same time/block:

2022-06-15 00:21:39 UTC  23/25 peers   433 MiB chain 0 bytes queue  RPC:  0 conn,    4 req/s,  736 µs
2022-06-15 00:21:41 UTC Imported #14964635 0x5a0e…188d (62 txs, 5.12 Mgas, 368 ms, 22.22 KiB)
2022-06-15 00:21:42 UTC `Pending` is deprecated and may be removed in future versions.
2022-06-15 00:21:45 UTC Imported #14964636 0x9e37…78b3 (94 txs, 8.69 Mgas, 605 ms, 37.62 KiB)
2022-06-15 00:22:09 UTC  23/25 peers   434 MiB chain 0 bytes queue  RPC:  0 conn,    4 req/s,  783 µs
2022-06-15 00:22:17 UTC Imported #14964638 0xbb94…10ea (80 txs, 5.88 Mgas, 537 ms, 29.65 KiB)
2022-06-15 00:22:22 UTC Imported #14964639 0x0ebc…ff8e (52 txs, 3.30 Mgas, 232 ms, 16.44 KiB)
2022-06-15 00:22:26 UTC Imported #14964641 0x329a…2294 (0 txs, 0.00 Mgas, 9 ms, 0.53 KiB)
2022-06-15 00:22:31 UTC `Pending` is deprecated and may be removed in future versions.
2022-06-15 00:22:39 UTC  23/25 peers   434 MiB chain 0 bytes queue  RPC:  0 conn,    9 req/s,  822 µs
2022-06-15 00:22:41 UTC Imported #14964642 0xec5c…1922 (238 txs, 18.33 Mgas, 1512 ms, 86.67 KiB)
2022-06-15 00:23:09 UTC  25/25 peers   435 MiB chain 0 bytes queue  RPC:  0 conn,    4 req/s,  695 µs
2022-06-15 00:23:14 UTC `Pending` is deprecated and may be removed in future versions.
2022-06-15 00:23:15 UTC `Pending` is deprecated and may be removed in future versions.
2022-06-15 00:23:16 UTC `Pending` is deprecated and may be removed in future versions.
2022-06-15 00:23:17 UTC Imported #14964643 0x471c…d092 (352 txs, 29.99 Mgas, 2090 ms, 138.12 KiB)

At the same time Erigon was at block 14964637, OE hit 14964641

mohoff avatar Jun 15 '22 01:06 mohoff

Will test this using a newer Erigon version once the node is synced

mohoff avatar Jun 15 '22 01:06 mohoff

Have the same for bsc. CPU: 5950X overclocked 4.5GHz, RAM: 128Gb, 4x SSD: 1Gb WD Black SN850 erigon compiled from latest devel branch

INFO[06-20|06:42:30.490] Commit cycle                             in=10.166724438s
INFO[06-20|06:42:30.490] RPC Daemon notified of new headers       from=18846348 to=18846351 header sending=7.271µs log sending=160ns
INFO[06-20|06:42:30.490] [1/16 Headers] Waiting for headers...    from=18846351
INFO[06-20|06:42:30.491] [1/16 Headers] Processed                 highest inserted=18846355 age=0
INFO[06-20|06:42:30.729] [6/16 Execution] Completed on            block=18846355
INFO[06-20|06:42:31.141] Timings (slower than 50ms)               Execution=220ms HashState=82ms IntermediateHashes=193ms
INFO[06-20|06:42:31.141] Tables                                   PlainState=164.2GB AccountChangeSet=1.5GB StorageChangeSet=3.9GB BlockTransaction=3.6GB TransactionLog=1.8TB FreeList=90.2MB ReclaimableSpace=90.2GB
INFO[06-20|06:42:41.568] Commit cycle                             in=10.426791655s
INFO[06-20|06:42:41.568] RPC Daemon notified of new headers       from=18846351 to=18846355 header sending=11.79µs log sending=160ns
INFO[06-20|06:42:41.568] [1/16 Headers] Waiting for headers...    from=18846355
INFO[06-20|06:42:41.569] [1/16 Headers] Processed                 highest inserted=18846358 age=0
INFO[06-20|06:42:41.815] [6/16 Execution] Completed on            block=18846358
INFO[06-20|06:42:42.225] Timings (slower than 50ms)               Execution=228ms HashState=86ms IntermediateHashes=185ms
INFO[06-20|06:42:42.225] Tables                                   PlainState=164.2GB AccountChangeSet=1.5GB StorageChangeSet=3.9GB BlockTransaction=3.6GB TransactionLog=1.8TB FreeList=90.2MB ReclaimableSpace=90.2GB
INFO[06-20|06:42:52.587] Commit cycle                             in=10.361861038s
INFO[06-20|06:42:52.587] RPC Daemon notified of new headers       from=18846355 to=18846358 header sending=7.35µs log sending=200ns
INFO[06-20|06:42:52.587] [1/16 Headers] Waiting for headers...    from=18846358
INFO[06-20|06:42:52.587] [1/16 Headers] Processed                 highest inserted=18846361 age=2s
INFO[06-20|06:42:53.356] [6/16 Execution] Completed on            block=18846361
INFO[06-20|06:42:53.692] Timings (slower than 50ms)               Bodies=529ms Execution=220ms HashState=63ms IntermediateHashes=159ms
INFO[06-20|06:42:53.692] Tables                                   PlainState=164.2GB AccountChangeSet=1.5GB StorageChangeSet=3.9GB BlockTransaction=3.6GB TransactionLog=1.8TB FreeList=90.2MB ReclaimableSpace=90.2GB
INFO[06-20|06:42:54.124] Commit cycle                             in=431.348777ms
INFO[06-20|06:42:54.124] RPC Daemon notified of new headers       from=18846358 to=18846361 header sending=6.22µs log sending=160ns
INFO[06-20|06:42:54.124] [1/16 Headers] Waiting for headers...    from=18846361
INFO[06-20|06:42:54.124] [1/16 Headers] Processed                 highest inserted=18846362 age=1s
INFO[06-20|06:42:54.399] [6/16 Execution] Completed on            block=18846362
INFO[06-20|06:42:54.524] Timings (slower than 50ms)               Bodies=187ms Execution=76ms IntermediateHashes=60ms
INFO[06-20|06:42:54.524] Tables                                   PlainState=164.2GB AccountChangeSet=1.5GB StorageChangeSet=3.9GB BlockTransaction=3.6GB TransactionLog=1.8TB FreeList=90.2MB ReclaimableSpace=90.2GB
INFO[06-20|06:42:58.237] Commit cycle                             in=3.712575954s

unlimittt avatar Jun 20 '22 06:06 unlimittt

we have couple solutions for Commit cycle speed, currently testing

AskAlexSharov avatar Jun 20 '22 08:06 AskAlexSharov

Another data point: Alpha Erigon seems lagging behind Beta Erigon as well. Seeing up to 11 block difference, few times 5-7 blocks difference over the last 24h while collecting metrics every min.

mohoff avatar Jun 24 '22 16:06 mohoff

@mohoff what if you add —snap.stop flag?

AskAlexSharov avatar Jun 25 '22 07:06 AskAlexSharov

Tested this over night but not helping @AskAlexSharov

Generally, do you recommend adding --snap.stop to Alpha nodes that finished the initial sync?

mohoff avatar Jun 29 '22 20:06 mohoff

No. --snap.stop it's just workaround if your node can't work because of some bug related to new snapshots creation.

AskAlexSharov avatar Jun 30 '22 02:06 AskAlexSharov

"but not helping" - what does it mean?

AskAlexSharov avatar Jun 30 '22 02:06 AskAlexSharov

It means we still see the same block lag as described in OP

mohoff avatar Jun 30 '22 05:06 mohoff

ah. current status: we have couple solutions for "Commit cycle" speed, currently testing

AskAlexSharov avatar Jun 30 '22 11:06 AskAlexSharov

we have this same problem using version v2022.06.03. Sometimes we are behind the current block by more than 80 blocks compared to other implementations. It always returns synced: false when this happens.

javibitso avatar Jul 01 '22 12:07 javibitso

Hi. I am running the version v2022.06.03-alpha since yesterday after the hard fork. It's reporting a similar behaviour. I had a peak of 100 blocks behind the Geth node. But the RPC method eth_syncing reported synced: false all the time. It was stuck in a block for more than 5 minutes due to the commit cycle.

MrLouzao avatar Jul 01 '22 12:07 MrLouzao

I can see commit cycles of about 17 seconds in my node:

[INFO] [07-01\|11:30:16.877] RPC Daemon notified of new headers       from=15056326 to=15056327 header sending=45.679µs log sending=227ns
[INFO] [07-01\|11:30:16.877] Commit cycle                             in=17.087272262s
[INFO] [07-01\|11:29:59.789] [snapshots] Retire Blocks                range=14960k-14961k
[INFO] [07-01\|11:29:59.789] Tables                                   PlainState=68.5GB AccountChangeSet=154.5GB StorageChangeSet=226.2GB BlockTransaction=23.1GB TransactionLog=410.7GB FreeList=60.0MB ReclaimableSpace=60.0GB
[INFO] [07-01\|11:29:59.789] Timings (slower than 50ms)               Headers=37.711s Execution=385ms HashState=498ms IntermediateHashes=741ms CallTraces=277ms AccountHistoryIndex=103ms StorageHistoryIndex=191ms LogIndex=188ms TxLookup=123ms Prune TxLookup=3m32.7s Prune Senders=1.552s
[INFO] [07-01\|11:29:58.237] [14/16 TxLookup] Prune done              in=3m32.70090061s
[INFO] [07-01\|11:29:55.714] [p2p] GoodPeers                          eth66=33

@AskAlexSharov how can I test the solutions you are talking about?

javibitso avatar Jul 01 '22 12:07 javibitso

it's in branch mdbx_erigon. but it's not ready to merge yet (some tests are in progress yet). this branch may take few days to totally eliminate "Commit cycle" spikes (see rare spikes at this time is ok).

AskAlexSharov avatar Jul 01 '22 12:07 AskAlexSharov

Same problem on 2022.07.01-alpha

Is there a estimation date for a fix? Cant use it in prod right now.

thx!

fernandezpaco avatar Jul 06 '22 08:07 fernandezpaco

We are also experiencing very frequent poor commit cycle times on v2022.07.02, in the order of several seconds. Also interesting, Timings (slower than 50ms) consistently reports Headers in the range of about 5-20 seconds, even on occasions when commit cycle times are good, less than a second. Does anyone know what is the meaning of these Headers timings?

[INFO] [07-14|09:12:12.355] RPC Daemon notified of new headers       from=15139262 to=15139263 header sending=4.856µs log sending=142ns
[INFO] [07-14|09:12:12.355] [1/16 Headers] Waiting for headers...    from=15139263
[INFO] [07-14|09:12:19.883] [p2p] GoodPeers                          eth66=25
[INFO] [07-14|09:12:28.339] [txpool] stat                            block=15139263 pending=498 baseFee=30000 queued=30000 alloc=3.1GB sys=4.6GB
[INFO] [07-14|09:12:32.355] [1/16 Headers] Wrote block headers       number=15139263 blk/second=0.000 alloc=2.3GB sys=4.6GB
[INFO] [07-14|09:12:39.161] [1/16 Headers] Processed                 highest inserted=15139264 age=28s
[INFO] [07-14|09:12:39.537] [6/16 Execution] Completed on            block=15139264
[INFO] [07-14|09:12:40.571] Timings (slower than 50ms)               Headers=26.805s Bodies=82ms Execution=274ms HashState=226ms IntermediateHashes=407ms CallTraces=79ms StorageHistoryIndex=98ms LogIndex=93ms
[INFO] [07-14|09:12:40.571] Tables                                   PlainState=69.5GB AccountChangeSet=1.5GB StorageChangeSet=2.6GB BlockTransaction=8.3GB TransactionLog=416.9GB FreeList=63.9MB ReclaimableSpace=63.9GB
[INFO] [07-14|09:12:41.873] Commit cycle                             in=1.302144423s
[INFO] [07-14|09:12:41.873] RPC Daemon notified of new headers       from=15139263 to=15139264 header sending=4.906µs log sending=149ns
[INFO] [07-14|09:12:41.873] [1/16 Headers] Waiting for headers...    from=15139264
[INFO] [07-14|09:12:54.836] [1/16 Headers] Processed                 highest inserted=15139265 age=16s
[INFO] [07-14|09:12:55.090] [6/16 Execution] Completed on            block=15139265
[INFO] [07-14|09:12:55.655] Timings (slower than 50ms)               Headers=12.963s Bodies=54ms Execution=154ms HashState=108ms IntermediateHashes=224ms LogIndex=58ms
[INFO] [07-14|09:12:55.655] Tables                                   PlainState=69.5GB AccountChangeSet=1.5GB StorageChangeSet=2.6GB BlockTransaction=8.3GB TransactionLog=416.9GB FreeList=63.9MB ReclaimableSpace=63.9GB
[INFO] [07-14|09:13:06.014] Commit cycle                             in=10.358762773s

mdben1247 avatar Jul 14 '22 07:07 mdben1247

Hi. Esterday merged to devel branch solution for this problem:

  • existing nodes need work couple days with this fix to fully fix "slow commit cycle" problem (during this time you will periodically see "slow commit cycle")

AskAlexSharov avatar Jul 20 '22 02:07 AskAlexSharov

This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] avatar Mar 04 '23 02:03 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Mar 14 '23 02:03 github-actions[bot]