go-ethereum
go-ethereum copied to clipboard
Geth refuses to sync after being a few months offline
Weird behaviour from geth when putting a client online after it's been offline for a few months. I would really appreciate if someone would explain me what's going on / put me on the right track.
So far happened:
- On Jan 22 did a fresh snap sync from scratch. The blockchain was synced within a matter of days. Awesome!
- Wen't offline for the summer. Let's call the world state before summer "S0".
- On Sept 22 went online again & wanted to go POS, so updated to geth 1.10.23 and started using prysm 3.0.0.
- After going online again, geth refuses to reconstruct the current world state "SC".
I am using the "--syncmode=snap" option. When "--syncmode=snap" is enabled, I have to admit I am not sure which one of the two ways, (A) or (B), geth is syncing:
A. The full-node way:
- Picks up the world state "S0" where it was left in before summer.
- Starts executing the blocks, until the state has been updated from "S0"->"SC"
B. Snapshot download:
- Proceeds in downloading state snapshot fragments from other peers
- ..using snapshot packets, reconstructs the state from scratch into "SC"?
- ..or tries to do an incremental heal of "S0" --> "SC" using snapshot fragments (sounds silly, right? I guess I'm just imaging this)
Is there some criteria which one of the two ways geth ends up doing? Say, if state is enough old, then do (B) instead of (A)?
Obviously, I would prefer (A), since I'm not syncing from scratch. Should I then remove "--syncmode=snap" ?
Next, let's take a look at the logs:
syys 05 11:28:36 sampsa-asus geth[11610]: INFO [09-05|11:28:36.595] Imported new block headers count=1 elapsed=4.940ms number=15,476,571 hash=0cb37e..4b58b2 age=1h51m15s
syys 05 11:28:46 sampsa-asus geth[11610]: INFO [09-05|11:28:46.906] State heal in progress accounts=11,848,[email protected] slots=34,874,[email protected] codes=107,[email protected] nodes=363,163,[email protected] pending=1
syys 05 11:29:04 sampsa-asus geth[11610]: WARN [09-05|11:29:04.616] Snapshot extension registration failed peer=35ba3061 err="peer connected on snap without compatible eth support"
syys 05 11:29:08 sampsa-asus geth[11610]: INFO [09-05|11:29:08.040] State heal in progress accounts=11,848,[email protected] slots=34,874,[email protected] codes=107,[email protected] nodes=363,163,[email protected] pending=1
syys 05 11:29:08 sampsa-asus geth[11610]: INFO [09-05|11:29:08.096] Imported new block headers count=1 elapsed=5.270ms number=15,476,572 hash=eab73d..75ece5 age=1h51m42s
syys 05 11:29:08 sampsa-asus geth[11610]: WARN [09-05|11:29:08.098] Dropping unsynced node during sync id=c83d584aa6a7cf69 conn=inbound addr=146.0.216.163:40678 type=Geth/v1.10.23-stable...
syys 05 11:29:23 sampsa-asus geth[11610]: WARN [09-05|11:29:23.229] Snapshot extension registration failed peer=2b87dc19 err="peer connected on snap without compatible eth support"
syys 05 11:29:27 sampsa-asus geth[11610]: WARN [09-05|11:29:27.443] Snapshot extension registration failed peer=6de3885d err="peer connected on snap without compatible eth support"
syys 05 11:30:04 sampsa-asus geth[11610]: INFO [09-05|11:30:04.652] Imported new block headers count=1 elapsed=5.077ms number=15,476,573 hash=695338..95f981 age=1h52m27s
syys 05 11:30:05 sampsa-asus geth[11610]: INFO [09-05|11:30:05.349] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=254.59KiB throttle=1030
syys 05 11:30:48 sampsa-asus geth[11610]: INFO [09-05|11:30:48.716] Imported new block headers count=2 elapsed=5.064ms number=15,476,575 hash=664007..cdd2dd age=1h52m55s
syys 05 11:30:49 sampsa-asus geth[11610]: WARN [09-05|11:30:49.374] Pivot became stale, moving old=15,476,449 new=15,476,513
syys 05 11:30:49 sampsa-asus geth[11610]: INFO [09-05|11:30:49.504] Imported new block receipts count=64 elapsed=129.241ms number=15,476,512 hash=14ddfa..f13f36 age=2h9m6s size=8.55MiB
syys 05 11:30:49 sampsa-asus geth[11610]: INFO [09-05|11:30:49.504] State heal in progress accounts=11,848,[email protected] slots=34,874,[email protected] codes=107,[email protected] nodes=363,163,[email protected] pending=1
syys 05 11:30:55 sampsa-asus geth[11610]: INFO [09-05|11:30:55.112] Imported new block headers count=1 elapsed=4.197ms number=15,476,576 hash=413332..413f19 age=1h52m57s
syys 05 11:31:23 sampsa-asus geth[11610]: INFO [09-05|11:31:23.472] Imported new block headers count=1 elapsed=4.914ms number=15,476,577 hash=192227..5f294e age=1h52m55s
syys 05 11:31:24 sampsa-asus geth[11610]: INFO [09-05|11:31:24.524] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=260.17KiB throttle=1008
syys 05 11:31:36 sampsa-asus geth[11610]: INFO [09-05|11:31:36.180] Imported new block headers count=2 elapsed=9.148ms number=15,476,579 hash=1d9258..ef0920 age=1h52m52s
syys 05 11:32:04 sampsa-asus geth[11610]: INFO [09-05|11:32:04.529] Imported new block headers count=2 elapsed=6.110ms number=15,476,581 hash=37288d..497e7a age=1h53m14s
syys 05 11:32:07 sampsa-asus geth[11610]: INFO [09-05|11:32:07.813] Imported new block headers count=6 elapsed=17.703ms number=15,476,587 hash=dbcbde..6a6aa8 age=1h52m40s
syys 05 11:32:11 sampsa-asus geth[11610]: INFO [09-05|11:32:11.064] Imported new block headers count=2 elapsed=5.816ms number=15,476,589 hash=f78c93..d547a7 age=1h52m35s
syys 05 11:32:14 sampsa-asus geth[11610]: INFO [09-05|11:32:14.329] Imported new block headers count=1 elapsed=5.343ms number=15,476,590 hash=1d970a..dbb580 age=1h52m36s
syys 05 11:32:17 sampsa-asus geth[11610]: INFO [09-05|11:32:17.596] Imported new block headers count=1 elapsed=5.462ms number=15,476,591 hash=8efbe0..077eb1 age=1h52m28s
syys 05 11:32:24 sampsa-asus geth[11610]: INFO [09-05|11:32:24.007] Imported new block headers count=2 elapsed=13.627ms number=15,476,593 hash=9d2ae6..f64c80 age=1h51m17s
syys 05 11:32:24 sampsa-asus geth[11610]: INFO [09-05|11:32:24.562] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=372.89KiB throttle=704
syys 05 11:32:30 sampsa-asus geth[11610]: INFO [09-05|11:32:30.406] Imported new block headers count=1 elapsed=5.857ms number=15,476,594 hash=59a38c..b964ff age=1h50m21s
syys 05 11:32:33 sampsa-asus geth[11610]: INFO [09-05|11:32:33.696] Imported new block headers count=1 elapsed=5.346ms number=15,476,595 hash=a0f7d8..ccb3e3 age=1h50m1s
syys 05 11:32:40 sampsa-asus geth[11610]: INFO [09-05|11:32:40.090] Imported new block headers count=3 elapsed=5.102ms number=15,476,598 hash=961009..fc298a age=1h49m9s
syys 05 11:32:43 sampsa-asus geth[11610]: INFO [09-05|11:32:43.361] Imported new block headers count=1 elapsed=4.918ms number=15,476,599 hash=05bd57..d6f4b9 age=1h49m
syys 05 11:32:49 sampsa-asus geth[11610]: INFO [09-05|11:32:49.788] Imported new block headers count=6 elapsed=22.194ms number=15,476,605 hash=91c36f..a618eb age=1h48m28s
syys 05 11:32:53 sampsa-asus geth[11610]: INFO [09-05|11:32:53.036] Imported new block headers count=1 elapsed=4.784ms number=15,476,606 hash=d3a357..1cb4d0 age=1h48m30s
syys 05 11:32:56 sampsa-asus geth[11610]: INFO [09-05|11:32:56.313] Imported new block headers count=1 elapsed=4.866ms number=15,476,607 hash=c0fc48..6b72f0 age=1h48m30s
syys 05 11:32:59 sampsa-asus geth[11610]: INFO [09-05|11:32:59.583] Imported new block headers count=1 elapsed=5.789ms number=15,476,608 hash=f7470d..d2f3c5 age=1h48m31s
- Blocks are imported at a good pace? A few milliseconds per block.
- "Pivot became stale, moving" : does this has to do something with downloading the snapshot from other peers? .. starting it all over again? Is it doing (A) or (B) (see above)?
- State heal in progress reports the number of accounts in ethereum.. this should be near hundred millions at the moment? Why the state is healed/updated soooo slow? (this is after being a few days online).
Let's continue:
syys 05 11:50:13 sampsa-asus geth[11610]: INFO [09-05|11:50:13.381] Disabling direct-ancient mode origin=15,477,053 ancient=14,627,238
syys 05 11:50:13 sampsa-asus geth[11610]: INFO [09-05|11:50:13.381] State heal in progress accounts=11,848,[email protected] slots=34,874,[email protected] codes=107,[email protected] nodes=363,168,[email protected] pending=1
syys 05 11:50:13 sampsa-asus geth[11610]: INFO [09-05|11:50:13.750] Imported new block headers count=9 elapsed=90.869ms number=15,477,128 hash=8cad3b..8a6686 age=1m10s ignored=66
syys 05 11:50:17 sampsa-asus geth[11610]: INFO [09-05|11:50:17.219] Imported new block headers count=1 elapsed=4.008ms number=15,477,129 hash=4a8b20..97bd42 age=1m10s
syys 05 11:50:18 sampsa-asus geth[11610]: INFO [09-05|11:50:18.430] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=262.34KiB throttle=1000
syys 05 11:50:18 sampsa-asus geth[11610]: INFO [09-05|11:50:18.451] Imported new block receipts count=10 elapsed=20.515ms number=15,477,063 hash=68f457..1a4bca age=14m52s size=2.11MiB
syys 05 11:50:18 sampsa-asus geth[11610]: INFO [09-05|11:50:18.451] State heal in progress accounts=11,848,[email protected] slots=34,874,[email protected] codes=107,[email protected] nodes=363,168,[email protected] pending=1
syys 05 11:50:25 sampsa-asus geth[11610]: INFO [09-05|11:50:25.672] Imported new block headers count=1 elapsed=6.006ms number=15,477,130 hash=5c6caa..88b273
syys 05 11:50:29 sampsa-asus geth[11610]: INFO [09-05|11:50:29.716] State heal in progress accounts=11,848,[email protected] slots=34,874,[email protected] codes=107,[email protected] nodes=363,169,[email protected] pending=19532
syys 05 11:50:40 sampsa-asus geth[11610]: WARN [09-05|11:50:40.869] Snapshot extension registration failed peer=1125cc47 err="peer connected on snap without compatible eth support"
syys 05 11:50:47 sampsa-asus geth[11610]: INFO [09-05|11:50:47.686] Imported new block headers count=1 elapsed=8.713ms number=15,477,131 hash=0145a7..0ce6dc
- What is disabling direct-ancient mode? Does it has to do with the problem at hand?
- Note the "State heal in progress" reporting number of pending accounts jumping from 1 to 19532. Why's that?
- Note that donwloaded and verified blocks are practically there: "age=1m10s"
- Then we have a lot of this going on:
syys 05 16:08:19 sampsa-asus geth[11610]: WARN [09-05|16:08:19.561] Unexpected bytecode heal packet peer=905d36fa reqid=7,304,675,219,279,965,590
- what's that about? Haven't seen a proper explanation of this anywhere.
The client was running ~ 4 days until I got fed up and put it offline. Got this far:
syys 05 16:08:58 sampsa-asus geth[11610]: INFO [09-05|16:08:58.909] State heal in progress accounts=12,866,[email protected] slots=37,899,[email protected] codes=111,[email protected] nodes=371,245,[email protected] pending=188,584
I interprete that downloaded and verified blocks are ok (it seem to reach the latest block occasionally), but the client is unable to reconstruct the up-to-date world state "SC".
I don't understand why. Once the block is downloaded and verified you get the state change, so what is interfering here with the state update/reconstruction?
Is there some competition/contradiction going on between (A) and (B)? (see above)
Some details on my system: 8 x Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz 2TB SSD drive: "Crucial BX500" 12 GB RAM (& all the swap you could wish for) (as mentioned, this was able to snapsync from scratch before summer)
P. S. It would be important to be able to contribute to the network without being a 24/7 avail/on-call eth-client-massage service provider.. to be offline if needed, etc. Imagine that I'd done staking at this point..!
It looks like you were not fully snap synced when you shut down your node the last time. You were most likely still in the state heal phase. Otherwise geth would import blocks one by one afterwards. The problem now is that you have a really old state that is not fully healed, so a lot of accounts, balances etc are different than the current state. In the state healing phase these states are fetched one by one from the network. Since they are so many differences between you and the network, the state heal phase will likely never finish. The best way to fix this is to stop geth and remove the old state that you have (via geth removedb
y
'n`). This will force the node to start a new snap sync which should finish in a couple of hours.
Regarding your PS: nodes can recover very quickly from a couple of days of downtime once they are fully synced (and healed). Since the gas limit in ethereum was voted up and is pretty high atm, the nodes have to do a lot of computation, especially if they want to sync up to the network
Thanks for the reply. But I am 100% sure that it synced.. checked it form the geth console, etc.
I guess there's nothing else left to do than to do yet another snap sync & I keep this ticket still open a few days just in case..
To confirm.. once the snap sync is over, after that geth will behave like a normal "full" node.. there's no condition/bug? under which it would start a snap sync again. And it's pitty no-one never explains that "Unexpected bytecode heal packet"..!
Yes a node will never start a sync again after its been synced once The "unexpected bytecode heal packet" is printed either when a peer sends us data that we haven't requested or if a reply arrives after the request already timed out (so you request something, the peer takes too long to respond, you stop tracking the request, the peer finally responds)
I'm running
geth removedb
Should I remove ancient database as well..?
I keps the ancient database & now it seems to work like a full node..?
syys 12 08:23:08 sampsa-asus geth[2567]: INFO [09-12|08:23:08.248] Imported new block receipts count=18 elapsed=135.821ms number=14,627,793 hash=769804..0be43a age=4mo3w2d
....
syys 12 08:23:03 sampsa-asus geth[2567]: WARN [09-12|08:23:03.989] Pivot seemingly stale, moving old=15,519,069 new=15,519,133
State heal logs missing at the moment
I've spent some time looking around and trying to understand the differences between geth's sync modes but am still left with some open questions. For example: Does a geth node that either snap or fast sync's initially, end up with the same database state as a "full node" that has executed all of the transactions since the genesis block? Or does snap sync involve only acquiring a "digest" of the storage state of all addresses as of the sync point, such that future requests made to this node for storage as of some older block, would then involve the node making new requests to peers to acquire that older state? Please forgive me if any of this is malformed. I'm asking these questions having made some possibly incorrect assumptions about "everything."
Having just written these questions, I thought of a better google query, and found this: https://github.com/ethereum/go-ethereum/issues/24413
Probably a good idea to incorporate some/all of the wisdom there, into the docs.
After doing ``geth removedb" (but keeping ancient database/freezer) now it's in the state heal process again. It downloaded all missing blocks and after that "state heal" logs started to appear.
Now, as always, logs are saturated with:
Unexpected trienode heal packet
State healing is painstakingly slow:
syys 13 16:51:30 sampsa-asus geth[1325]: INFO [09-13|16:51:30.912] State heal in progress accounts=524,[email protected] slots=1,009,[email protected] [email protected] nodes=5,091,[email protected] pending=29246
Maybe 500 000 accounts per 24h ..?
So geth removedb
didn't help. Now what? Should I also remove also the "ancient" database and try to do snap sync?
download recent global state "SC0" using snap
then that global state needs to be "healed" using the downloaded blocks "SC0" --> blocks --> "SC1" (current global state)
Nobody knows why.. just (clueless) nerds (like myself) trying to scrap information from the internet. Do you have any "testing" lab where you put clients on & offline & try to disturb them in some private geth ethereum network? Or does "ci/cd testing" equal to "merge" (I hope not).
EDIT: maybe the ancient database is corrupt..? Does the ancient database include a state trie that is loaded in the beginning & then it tries to heal it..
..or is the latest state fetched with snap sync if I have done geth removedb
(but not removed the ancient database)
On a related note, I ended up with the impression that it has become quite difficult to stand up a full node in lieu of using centralized providers such as alchemy/moralis/infura etc because of these types of troubles (and perhaps my last attempt was when snap sync was too new). That slowness and/or troubles in this area ends up pushing dApp providers to rely on centralized resources such as Infura/Moralis/Alchemy (vs it being super easy to stand up their own nodes) may be an issue to think about (pros and cons to each).
It would be really nice to be able to stand up a full node that can answer historical requests (gcmode archive) quickly by acquiring that database and/or being able to forward such queries to/from peers, and then verifying it on a best-efforts basis over a longer timeframe. I'm not fully conversant on the security implications and/or implementation difficulties and/or bandwidth/resource management issues of doing such a thing -- but I sense it would help reduce centralization onto a handful of dApp service providers.
Similar scenarios in my node as well. I have 2 geth archival nodes connected to 2 beacon nodes and both seems to have completed merge. But the nodes seems to be syncing very slowly. PFB the status from both nodes
GETH node 0 - INFO [09-21|06:25:06.231] Imported new potential chain segment blocks=1 txs=295 mgas=22.221 elapsed=9.974s mgasps=2.228 number=15,559,291 hash=0961a7..1d9571 age=2d21h47m dirty=0.00B node 1 - INFO [09-21|06:25:07.196] Imported new potential chain segment blocks=1 txs=371 mgas=27.281 elapsed=9.188s mgasps=2.969 number=15,557,806 hash=9b6067..887a31 age=3d2h46m dirty=0.00B
BEACON node 0 - time="2022-09-21 06:26:27" level=info msg="Processing block batch of size 64 starting from 0xb47a08d1... 4722176/4743130 - estimated time remaining 15m39s" blocksPerSecond=22.3 peers=49 prefix=initial-sync node 1 - time="2022-09-21 06:25:20" level=info msg="Processing block batch of size 64 starting from 0x2ff82104... 4720690/4743124 - estimated time remaining 29m19s" blocksPerSecond=12.8 peers=45 prefix=initial-sync
The slots(3 days behind current slot) in beacon seems to be in sync with the block shown in geth node. Can anyone confirm if this is healthy and our sync is well and good but a bit slow ?
Now, as always, logs are saturated with:
Unexpected trienode heal packet
This has actually been fixed, but not pushed to stable
(a.k.a v1.10.xx
). If you use latest
(a.k.a 1.11.xx -unstable
) you should not see these messages (as much). Several issues related to snap-sync were fixed, and in many cases that were previously unable to sync due to IO performance is now made possible.
So, I recommend switching geth-version.
So geth removedb didn't help. Now what? Should I also remove also the "ancient" database and try to do snap sync?
I recommend
- Use 1.11
- do
geth removedb
but keep ancients intact.
@Sreeram1993
node 0 - INFO [09-21|06:25:06.231] Imported new potential chain segment blocks=1 txs=295 mgas=22.221 elapsed=9.974s mgasps=2.228 number=15,559,291 hash=0961a7..1d9571 age=2d21h47m dirty=0.00B
node 1 - INFO [09-21|06:25:07.196] Imported new potential chain segment blocks=1 txs=371 mgas=27.281 elapsed=9.188s mgasps=2.969 number=15,557,806 hash=9b6067..887a31 age=3d2h46m dirty=0.00B
These numbers aren't great -- as you can sees, importing one block takes almost 10s
. That is very close to the block-time. Obviously, if import is slower than block-time, then you will not be able to keep up. Other than that, yeah, looks ok to me
Oh, btw -- is there really a point in keeping this bug open? Seems like the discussion has trailed off into new areas?
Thanks for the update @holiman Is there anyway to increase the import time, I am already using an 16TB SSD and the nodes are in AKS as well. Any sol to increase peers or network bandwidth so that archival node stays upto date ?
Increasing peers will most likely have a negative impact. I don't know what AKS means.
If you are interested in trying something new, we have a new database backend which we think will be an improvement: https://github.com/ethereum/go-ethereum/pull/24615 . Unfortunately, we don't have any conversion-tool, so the only thing one can do with it is to sync from scratch -- which is a PITA with archive node, I know, so I don't actually expect you to try it out.
Other than that, I don't have a lot of ideas.
@holiman Got it fixed and I started running. But I get this INFO for blocks multiple times for a same block.
--GETH-- INFO [09-22|13:01:36.034] Imported new potential chain segment blocks=1 txs=210 mgas=25.110 elapsed=1m19.317s mgasps=0.317 number=15,588,279 hash=461b03..8964d1 age=2h34m37s dirty=0.00B INFO [09-22|13:01:36.176] Imported new potential chain segment blocks=1 txs=210 mgas=25.110 elapsed=142.118ms mgasps=176.686 number=15,588,279 hash=461b03..8964d1 age=2h34m37s dirty=0.00B INFO [09-22|13:01:36.318] Imported new potential chain segment blocks=1 txs=210 mgas=25.110 elapsed=141.422ms mgasps=177.556 number=15,588,279 hash=461b03..8964d1 age=2h34m37s dirty=0.00B --GETH-- Certain blocks show import just once while some shows multiple info messages like above. Is it corrupting the node ?
I have the same issue after upgrading geth syncmode from Full to Snap. 12 Hours later it is still syncing with no clear indication on when or if the process will end. There are some posts that the parameter "accounts" should reach the total number of Ethereum accounts (201M today) for the snap sync process to end. Which in this case, means it will never ends. Or shall I simply be more patient?
Sep 23 12:34:04 eth2-validator-0 bash[24081]: INFO [09-23|12:34:04.490] State heal in progress accounts=169,[email protected] slots=293,[email protected] [email protected] nodes=2,170,[email protected] pending=20889
In an attempt to follow @holiman recommendations, I failed to locate the latest version of geth (a.k.a 1.11.xx -unstable). The last version published on github is the 1.10.25. A google seach on "go-ethereum v1.11.1" is not helpful either. Any help is welcome
All it took was enough patience. Snap Sync finalized after 20 hours on a GCloud 16vCPU (e2-standard-16) + Balanced Persistent Disks. It will be downgraded to e2-standard-4 now that it is fully synced. Contrary to what is reported in other posts the parameter accounts in the log file below does not need to reach the total number of Ethereum accounts for the snap sync process to finalize.
Sep 23 12:34:04 eth2-validator-0 bash[24081]: INFO [09-23|12:34:04.490] State heal in progress accounts=169,[email protected] slots=293,[email protected] codes=[email protected] nodes=2,170,[email protected] pending=20889
@Yvelo The issue is mine is an archival sync node :D So I think I need a lot more patience. Happily it has reached 6 hours behind
I failed to locate the latest version of geth (a.k.a 1.11.xx -unstable). The last version published on github is the 1.10.25. A google seach on "go-ethereum v1.11.1" is not helpful either. Any help is welcome
Didn't see this until now. On any of our "Releases" pages, e.g. https://github.com/ethereum/go-ethereum/releases/tag/v1.10.25 , we link to
Pre-built binaries for all platforms on our downloads page.
On that download-page, we have 'stable' and 'develop' builds.
I am doing sync from scratch with geth version: 1.10.25-stable
With this command line:
geth --cache=12048 --http --datadir /path/to/geth/snap-data --authrpc.jwtsecret /path/to/jwt.hex
The datadir is now about 605G, and the "State heal in progress" started about 2 days ago, and now the logs is showing:
INFO [10-18|00:41:59.699] State heal in progress accounts=1,800,[email protected] slots=4,221,[email protected] [email protected] nodes=21,541,[email protected] pending=26107
So it is progressing about 1M accounts per day? At this rate, I need to wait 200 days for it to be fully sync to 200M accounts?
@Yvelo can you share what is the number of account shown in your logs when it's fully sync? is it around 169K accounts?
What indicator in the log that can be used to estimate the time for the "State heal" to finish?
If I run this command:
geth attach http://localhost:8545
and type:
> eth.syncing
{
currentBlock: 15773502,
healedBytecodeBytes: 20841216,
healedBytecodes: 2605,
healedTrienodeBytes: 6011477028,
healedTrienodes: 21616064,
healingBytecode: 0,
healingTrienodes: 1153,
highestBlock: 15773610,
startingBlock: 0,
syncedAccountBytes: 45757800173,
syncedAccounts: 192540876,
syncedBytecodeBytes: 4627746078,
syncedBytecodes: 700408,
syncedStorage: 814610039,
syncedStorageBytes: 174772608597
}
There is no saying what is the "highest" number of trie nodes are there?
So it is progressing about 1M accounts per day? At this rate, I need to wait 200 days for it to be fully sync to 200M accounts?
No, that's not how it works. But I'm curious, why are you using 1.10.25
? As I've recommended above, please use latest
a.k.a 1.11.xx
which has a lot of snap-sync related fixes.
There is no saying what is the "highest" number of trie nodes are there?
There's no point trying to infer the state of the sync based on now many nodes or accounts you have synced. I know it's tempting (users compare those numbers all the time), but it's not a good metric.
It's like two people walking upwards in an escalator going downwards: compare how many steps they take to reach the top. But they walk at different speeds. And one started near the bottom, another close to the top.
Thanks for the prompt reply, @holiman!
I used 1.10.25 because I started using geth around 5 days ago without knowing about this bug. I just I read about this bug now. Hearing that I should delete everything and start from scratch again feels discouraging.
If I were to use 1.11.xx, what is the expected time to sync from scratch to full? Should I use the same command line as above? or is there a faster way to do it? perhaps like a checkpoint sync?
I am wondering, so there is no good metric at all to tell what is the ETA for the state heal? I don't need to compare with other people if the log shows any indicator about the ETA ...
Hearing that I should delete everything and start from scratch again feels discouraging.
Just switch then, you don't have to restart from scratch.
If I were to use 1.11.xx, what is the expected time to sync from scratch to full? Should I use the same command line as above? or is there a faster way to do it? perhaps like a checkpoint sync?
A couple of days. Yes. No. No.
I am wondering, so there is no good metric at all to tell what is the ETA for the state heal? I don't need to compare with other people if the log shows any indicator about the ETA ...
No. If there were, you can rest assured we would have added it and displayed it promptly to the users. It's something that users have requested consistently since 2016.
I cannot install 1.11.xx from brew on MacOS using this doc:
https://geth.ethereum.org/docs/install-and-build/installing-geth
Running:
brew install ethereum --devel
I got:
Error: invalid option: --devel
If I go to the downloads page:
https://geth.ethereum.org/downloads/
I can't seem to find the 1.11 on the "Develop builds" section?
it looks empty:
When will the 1.11 be released into stable?
I can't seem to find the 1.11 on the "Develop builds" section?
Hm, yes, that's some bug in our pipeline. We'll look into it.
Fixed, macos binaries available again
It is quite frustrating experience when using geth to sync from scratch without any indicator of when it will complete. The logs did print the number of accounts: mine showing about 1M accounts, however AFAIK there are 200M accounts, and I've been syncing for 4+ days nonstop. At this rate, it will take 1+ years to fully sync. I am not comfortable of using the unstable 1.11 version. I don't know what other issues it will have? I don't want to risk it for now.
So I gave besu a try, syncing from scratch using checkpoint, and it finished in a bit less than 2 days using this command:
besu \
--sync-mode=X_CHECKPOINT \
--data-storage-format=BONSAI \
--data-path=besu-data-checkpoint \
--rpc-http-enabled \
--engine-jwt-enabled=true \
--engine-jwt-secret=/path/to/jwt.hex \
--engine-host-allowlist="*" \
--metrics-enabled \
--Xplugin-rocksdb-high-spec-enabled
The logs is also very clear, it shows:
2022-10-20 07:25:35.524-07:00 | EthScheduler-Services-26 (batchPersistTrieNodeData) | INFO | SnapsyncMetricsManager | Healed 15342826 world state nodes
The current number of nodes is 15790393, that is when I know it's near finish.
Compare that with the log from geth:
INFO [10-18|00:41:59.699] State heal in progress accounts=1,800,[email protected] slots=4,221,[email protected] [email protected] nodes=21,541,[email protected] pending=26107
The number of nodes is 21,541,853
? Perhaps it's referring to another data structure? I don't know how to interpret that to estimate the completion time?
I hope my feedback can make geth logs better.
The logs did print the number of accounts: mine showing about 1M accounts, however AFAIK there are 200M accounts, and I've been syncing for 4+ days nonstop. At this rate, it will take 1+ years to fully sync.
As I have already said: There's no point trying to infer the state of the sync based on now many nodes or accounts you have synced. I know it's tempting (users compare those numbers all the time), but it's not a good metric. You cannot extrapolate the remaining state-heal time based on those numbers.
So I gave besu a try, syncing from scratch using checkpoint, and it finished in a bit less than 2 days
Good to hear!
I don't know how to interpret that to estimate the completion time?
No, as I've said, that's not the point. If it could be done, we would be doing it already, and show you the estimate.
I have a similar problem. When are you plan to add this fix to ethereum/client-go stable docker tag?