go-ethereum
go-ethereum copied to clipboard
FYI - Sync mode SNAP does not work
I run several geth nodes, but I've always used Sync mode fast. I am building a new machine, so I figured I would try snap (the new default). This is running on a HP DL 380 G5 with 24 GB of RAM and a 1TB SSD.
Using SNAP sync, geth will go into these modes of compacting the database and it completely halts the sync process. It does this for over an hour sometimes. But, I let it continue. Now, this morning my sync appears to be stuck in "state heal in progress". It looks like it's been at this for hours. I give up. I am going back to Fast.
Compacting means geth is writing data faster than leveldb can handle, so it pushes back a bit. It's not a fault, it's the fact that snap sync delivers data a lot faster than 'fast'.
I am having the same issue, ever since the latest update, i cannot sync my node completely

i will go back to fast sync or should I wipe db and try again? Is there anything i can do to help get more info for geth to understand why?
The nodes-count is increasing -- it's progressing.
The unexpected trienode heal packet is that normal? That is like 90% of my logs at this time, if it is normal, should we make it into an "INFO" instead.
No, that's not an expected error (not that many of them)
This is my second time doing a fresh sync, and I am getting many Unexpected trienode heal packet "warnings", it seems we are getting a lot of them, anything incorrect with my setup that would lead to this?
WARN [07-18|16:45:26.500] Unexpected trienode heal packet peer=a0378d3b reqid=4,622,035,716,730,946,788
WARN [07-18|16:45:26.565] Unexpected trienode heal packet peer=d9d3b409 reqid=5,186,876,070,523,757,744
WARN [07-18|16:45:26.784] Unexpected trienode heal packet peer=5532d114 reqid=4,913,062,635,039,342,268
WARN [07-18|16:45:27.002] Unexpected trienode heal packet peer=765491aa reqid=8,537,381,729,530,772,935
WARN [07-18|16:45:27.134] Unexpected trienode heal packet peer=17935a2e reqid=398,999,258,026,622,278
WARN [07-18|16:45:27.285] Unexpected trienode heal packet peer=fb395765 reqid=2,540,630,659,186,956,989
WARN [07-18|16:45:27.394] Unexpected trienode heal packet peer=6d63f63c reqid=3,060,663,354,523,276,049
WARN [07-18|16:45:29.757] Unexpected trienode heal packet peer=a8523f94 reqid=9,026,963,612,786,922,325
WARN [07-18|16:45:29.859] Unexpected trienode heal packet peer=889cc785 reqid=1,566,881,240,675,074,312
WARN [07-18|16:45:30.026] Unexpected trienode heal packet peer=a6d04daf reqid=2,158,928,981,022,095,841
WARN [07-18|16:45:30.152] Unexpected trienode heal packet peer=9e9cac22 reqid=3,592,232,586,991,666,835
WARN [07-18|16:45:31.019] Unexpected trienode heal packet peer=5d8589a4 reqid=393,438,786,560,958,185
WARN [07-18|16:45:31.181] Unexpected trienode heal packet peer=91bbf60e reqid=4,523,948,219,739,688,930
WARN [07-18|16:45:31.372] Unexpected trienode heal packet peer=d775bde9 reqid=4,708,330,904,872,538,623
WARN [07-18|16:45:32.379] Unexpected trienode heal packet peer=c4532ff9 reqid=6,874,370,675,488,852,123
WARN [07-18|16:45:32.380] Unexpected trienode heal packet peer=4fa42e20 reqid=3,267,196,924,822,701,039
WARN [07-18|16:45:32.542] Unexpected trienode heal packet peer=f1b13a02 reqid=884,270,897,609,630,641
INFO [07-18|16:45:43.977] State heal in progress [email protected] [email protected] [email protected] nodes=397,[email protected] pending=123,495
WARN [07-18|16:45:43.978] Unexpected trienode heal packet peer=5d664149 reqid=3,710,167,488,769,797,833
WARN [07-18|16:45:43.981] Unexpected trienode heal packet peer=e3dbfe41 reqid=2,073,374,780,961,819,677
WARN [07-18|16:45:43.983] Unexpected trienode heal packet peer=bb71328b reqid=8,121,458,505,920,535,863
WARN [07-18|16:45:43.985] Unexpected trienode heal packet peer=a1ea248b reqid=267,783,068,321,922,388
WARN [07-18|16:45:43.991] Unexpected trienode heal packet peer=a5201f7c reqid=1,439,199,076,782,710,385
WARN [07-18|16:45:44.056] Unexpected trienode heal packet peer=61551118 reqid=3,545,386,287,666,985,539
WARN [07-18|16:45:44.056] Unexpected trienode heal packet peer=85c41b4d reqid=2,200,301,475,214,792,901
WARN [07-18|16:45:44.057] Unexpected trienode heal packet peer=cb4a957a reqid=151,712,146,799,871,033
WARN [07-18|16:45:44.062] Unexpected trienode heal packet peer=8320ab75 reqid=7,465,864,124,670,898,925
WARN [07-18|16:45:44.064] Unexpected trienode heal packet peer=3319911a reqid=1,676,101,697,579,000,381
WARN [07-18|16:45:44.065] Unexpected trienode heal packet peer=bc0a22b2 reqid=7,641,458,910,862,074,927
WARN [07-18|16:45:44.125] Unexpected trienode heal packet peer=3fe6f503 reqid=7,863,934,179,564,415,364
WARN [07-18|16:45:44.129] Unexpected trienode heal packet peer=22350643 reqid=4,517,462,049,592,695,959
WARN [07-18|16:45:44.130] Unexpected trienode heal packet peer=6db8fffa reqid=3,966,230,954,419,389,835
WARN [07-18|16:45:44.452] Unexpected trienode heal packet peer=b6dc91f8 reqid=1,988,555,966,599,462,679
WARN [07-18|16:45:44.502] Unexpected trienode heal packet peer=dd94e72d reqid=4,450,099,697,928,108,082
WARN [07-18|16:45:44.504] Unexpected trienode heal packet peer=5b858358 reqid=8,925,946,666,813,154,737
WARN [07-18|16:45:44.509] Unexpected trienode heal packet peer=23a85aa5 reqid=3,080,196,199,118,744,386
WARN [07-18|16:45:44.511] Unexpected trienode heal packet peer=eba98804 reqid=8,349,544,363,420,424,856
WARN [07-18|16:45:44.513] Unexpected trienode heal packet peer=ea43a6fe reqid=1,184,655,664,638,635,660
WARN [07-18|16:45:44.516] Unexpected trienode heal packet peer=12b7db81 reqid=5,793,738,489,717,339,364
WARN [07-18|16:45:44.516] Unexpected trienode heal packet peer=e70811b4 reqid=2,177,353,076,724,056,341
WARN [07-18|16:45:44.763] Unexpected trienode heal packet peer=22e17f9c reqid=8,853,990,770,070,346,560
WARN [07-18|16:45:44.811] Unexpected trienode heal packet peer=e6382c70 reqid=739,770,737,457,955,743
INFO [07-18|16:45:44.850] Imported new block headers count=1 elapsed=6.160ms number=12,851,749 hash=7a3243..6dae55 age=1m23s
WARN [07-18|16:45:44.871] Unexpected trienode heal packet peer=cd828ba4 reqid=8,798,951,724,798,149,997
WARN [07-18|16:45:44.873] Unexpected trienode heal packet peer=6e5635a0 reqid=8,705,205,715,673,999,671
WARN [07-18|16:45:44.875] Unexpected trienode heal packet peer=cbdd12a9 reqid=1,750,396,375,655,342,341
WARN [07-18|16:45:44.876] Unexpected trienode heal packet peer=9f0aaa30 reqid=4,072,861,921,597,398,313
WARN [07-18|16:45:44.879] Unexpected trienode heal packet peer=a0378d3b reqid=8,386,183,962,563,361,933
WARN [07-18|16:45:44.879] Unexpected trienode heal packet peer=d9d3b409 reqid=8,295,232,788,244,893,801
WARN [07-18|16:45:44.882] Unexpected trienode heal packet peer=5532d114 reqid=3,623,043,430,561,675,816
WARN [07-18|16:45:44.911] Unexpected trienode heal packet peer=765491aa reqid=7,752,667,725,631,141,893
WARN [07-18|16:45:44.958] Unexpected trienode heal packet peer=17935a2e reqid=4,857,961,192,605,514,856
WARN [07-18|16:45:45.009] Unexpected trienode heal packet peer=fb395765 reqid=2,054,719,662,305,818,389
WARN [07-18|16:45:45.058] Unexpected trienode heal packet peer=6d63f63c reqid=6,657,913,199,915,039,014
WARN [07-18|16:45:45.081] Unexpected trienode heal packet peer=a8523f94 reqid=7,638,890,637,230,011,266
WARN [07-18|16:45:45.114] Unexpected trienode heal packet peer=889cc785 reqid=2,929,668,062,071,838,105
WARN [07-18|16:45:45.169] Unexpected trienode heal packet peer=a6d04daf reqid=8,942,089,650,430,097,371
WARN [07-18|16:45:45.171] Unexpected trienode heal packet peer=9e9cac22 reqid=4,503,892,313,274,969,736
WARN [07-18|16:45:45.229] Unexpected trienode heal packet peer=5d8589a4 reqid=3,594,601,607,111,485,883
WARN [07-18|16:45:45.231] Unexpected trienode heal packet peer=91bbf60e reqid=5,875,472,871,662,902,193
WARN [07-18|16:45:45.259] Unexpected trienode heal packet peer=d775bde9 reqid=1,079,832,331,273,526,425
INFO [07-18|16:45:45.378] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=243.11KiB throttle=1079
I checked again, 93% of my logs are “Unexpected trienode heal packet”
if there is anything i can do to help gather more logs for understanding how to fix it, let me know.
@mohamedmansour
The stage State heal in progress takes several hours depending on your hardware specs. How much CPU core have you? What is the network bandwidth?
@mohamedmansour
The stage State heal in progress takes several hours depending on your hardware specs. How much CPU core have you? What is the network bandwidth?
I have 4 CPUs, 16GB RAM, 2Gbit network on Azure VM https://azure.microsoft.com/en-us/blog/introducing-b-series-our-new-burstable-vm-size/ (Standard_B4ms) = $140/month
AWS c6g.xlarge. That's 4 high-performance cores, 16GiB, and I added a 6000 IOPS, 1000MiB/s throughput EBS volume, up to 10 Gb network. One night of snapsync.
WARN [07-23|07:18:10.749] Unexpected trienode heal packet peer=a6d04daf reqid=5,628,752,454,634,351,532
WARN [07-23|07:18:10.749] Unexpected trienode heal packet peer=bfdaf535 reqid=2,129,613,245,800,917,217
WARN [07-23|07:18:10.759] Unexpected trienode heal packet peer=30ac3a13 reqid=2,746,357,763,380,049,576
WARN [07-23|07:18:12.273] Unexpected trienode heal packet peer=e28df4e4 reqid=1,660,082,010,578,520,634
WARN [07-23|07:18:13.454] Unexpected trienode heal packet peer=580f3239 reqid=38,957,340,314,670,213
INFO [07-23|07:18:18.379] State heal in progress accounts=228,[email protected] slots=474,[email protected] [email protected] nodes=2,520,[email protected] pending=22518
INFO [07-23|07:18:18.821] Imported new block headers count=1 elapsed=5.535ms number=12,881,249 hash=1a723b..7d6a91
WARN [07-23|07:18:21.471] Checkpoint challenge timed out, dropping id=4537608e56285b37 conn=inbound addr=65.21.95.47:34618 type=erigon/v2021.07.4/li...
INFO [07-23|07:18:26.424] State heal in progress accounts=228,[email protected] slots=474,[email protected] [email protected] nodes=2,521,[email protected] pending=22194
WARN [07-23|07:18:26.453] Unexpected trienode heal packet peer=e28df4e4 reqid=7,019,785,347,053,342,247
WARN [07-23|07:18:31.475] Unexpected trienode heal packet peer=30ac3a13 reqid=8,104,986,034,277,065,248
WARN [07-23|07:18:31.492] Unexpected trienode heal packet peer=805b1ebf reqid=847,938,211,426,238,144
WARN [07-23|07:18:32.924] Unexpected trienode heal packet peer=436cfb8d reqid=5,826,258,251,133,243,579
INFO [07-23|07:18:35.204] State heal in progress accounts=228,[email protected] slots=474,[email protected] [email protected] nodes=2,523,[email protected] pending=22234
WARN [07-23|07:18:35.205] Unexpected trienode heal packet peer=580f3239 reqid=2,897,966,948,454,458,237
WARN [07-23|07:18:35.231] Unexpected trienode heal packet peer=e28df4e4 reqid=6,338,490,710,152,459,505
WARN [07-23|07:18:35.240] Unexpected trienode heal packet peer=0f8d93b7 reqid=6,558,336,474,454,724,048
WARN [07-23|07:18:36.250] Unexpected trienode heal packet peer=c34194de reqid=4,258,489,952,416,859,887
WARN [07-23|07:18:36.253] Unexpected trienode heal packet peer=61551118 reqid=2,074,155,154,236,670,882
WARN [07-23|07:18:36.255] Unexpected trienode heal packet peer=54080ee5 reqid=6,077,024,708,527,433,002
WARN [07-23|07:18:37.657] Unexpected trienode heal packet peer=01f352da reqid=9,041,030,120,532,686,969
WARN [07-23|07:18:38.767] Unexpected trienode heal packet peer=a6d04daf reqid=9,015,027,624,447,844,791
WARN [07-23|07:18:39.993] Unexpected trienode heal packet peer=30ac3a13 reqid=708,368,138,623,231,300
INFO [07-23|07:18:40.155] Imported new block headers count=1 elapsed=5.390ms number=12,881,250 hash=75e84f..8e55ae
INFO [07-23|07:18:40.196] Imported new block headers count=1 elapsed=5.099ms number=12,881,251 hash=203c55..a9c8ba
INFO [07-23|07:18:40.299] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=243.22KiB throttle=1078
INFO [07-23|07:18:43.248] State heal in progress accounts=229,[email protected] slots=475,[email protected] [email protected] nodes=2,525,[email protected] pending=22579
INFO [07-23|07:18:51.645] State heal in progress accounts=229,[email protected] slots=475,[email protected] [email protected] nodes=2,527,[email protected] pending=22001
INFO [07-23|07:18:59.679] State heal in progress accounts=229,[email protected] slots=475,[email protected] [email protected] nodes=2,529,[email protected] pending=22249
INFO [07-23|07:19:04.571] Imported new block headers count=1 elapsed=5.083ms number=12,881,252 hash=ade0fb..eb0a79
INFO [07-23|07:19:07.654] Imported new block headers count=1 elapsed=5.050ms number=12,881,253 hash=69ff20..5b4202
INFO [07-23|07:19:08.080] State heal in progress accounts=230,[email protected] slots=475,[email protected] [email protected] nodes=2,530,[email protected] pending=22411
INFO [07-23|07:19:10.778] Imported new block headers count=1 elapsed=5.106ms number=12,881,254 hash=ce4e76..b337cd
WARN [07-23|07:19:15.211] Unexpected bytecode heal packet peer=a6d04daf reqid=5,663,218,223,262,397,112
INFO [07-23|07:19:16.520] State heal in progress accounts=230,[email protected] slots=475,[email protected] [email protected] nodes=2,532,[email protected] pending=22693
WARN [07-23|07:19:21.465] Unexpected bytecode heal packet peer=54080ee5 reqid=3,077,866,949,983,522,985
INFO [07-23|07:19:24.589] State heal in progress accounts=230,[email protected] slots=476,[email protected] [email protected] nodes=2,533,[email protected] pending=22916
INFO [07-23|07:19:29.071] Imported new block headers count=1 elapsed=5.203ms number=12,881,255 hash=260e4c..1afdb0
INFO [07-23|07:19:32.599] State heal in progress accounts=230,[email protected] slots=476,[email protected] [email protected] nodes=2,535,[email protected] pending=23256
INFO [07-23|07:19:40.710] State heal in progress accounts=230,[email protected] slots=476,[email protected] [email protected] nodes=2,537,[email protected] pending=23830
INFO [07-23|07:19:41.280] Imported new block headers count=1 elapsed=5.200ms number=12,881,256 hash=9d6a30..18ed89
INFO [07-23|07:19:41.330] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=236.02KiB throttle=1111
INFO [07-23|07:19:48.871] State heal in progress accounts=231,[email protected] slots=476,[email protected] [email protected] nodes=2,539,[email protected] pending=24121
INFO [07-23|07:19:56.999] State heal in progress accounts=231,[email protected] slots=477,[email protected] [email protected] nodes=2,541,[email protected] pending=24316
INFO [07-23|07:20:05.161] State heal in progress accounts=231,[email protected] slots=478,[email protected] [email protected] nodes=2,543,[email protected] pending=23817
INFO [07-23|07:20:13.392] State heal in progress accounts=231,[email protected] slots=478,[email protected] [email protected] nodes=2,545,[email protected] pending=23275
INFO [07-23|07:20:21.912] State heal in progress accounts=231,[email protected] slots=479,[email protected] [email protected] nodes=2,548,[email protected] pending=22951
INFO [07-23|07:20:29.985] State heal in progress accounts=231,[email protected] slots=479,[email protected] [email protected] nodes=2,550,[email protected] pending=22448
INFO [07-23|07:20:29.992] Imported new block headers count=1 elapsed=5.098ms number=12,881,257 hash=830c05..7bd316 age=1m21s
WARN [07-23|07:20:34.268] Unexpected trienode heal packet peer=0f8d93b7 reqid=5,110,344,849,506,036,345
WARN [07-23|07:20:34.624] Unexpected trienode heal packet peer=c34194de reqid=176,407,516,446,904,836
WARN [07-23|07:20:34.875] Unexpected trienode heal packet peer=61551118 reqid=7,745,011,524,738,817,427
WARN [07-23|07:20:36.752] Unexpected trienode heal packet peer=01f352da reqid=8,576,927,767,064,613,316
WARN [07-23|07:20:37.340] Unexpected trienode heal packet peer=a6d04daf reqid=7,790,792,082,625,331,242
INFO [07-23|07:20:38.767] State heal in progress accounts=232,[email protected] slots=480,[email protected] [email protected] nodes=2,551,[email protected] pending=22355
WARN [07-23|07:20:40.802] Unexpected trienode heal packet peer=580f3239 reqid=2,584,556,318,708,171,106
WARN [07-23|07:20:41.044] Unexpected trienode heal packet peer=e28df4e4 reqid=4,683,100,952,475,139,577
WARN [07-23|07:20:41.063] Unexpected trienode heal packet peer=0f8d93b7 reqid=7,292,729,526,860,797,397
INFO [07-23|07:20:46.982] State heal in progress accounts=232,[email protected] slots=480,[email protected] [email protected] nodes=2,553,[email protected] pending=22692
INFO [07-23|07:20:55.447] State heal in progress accounts=232,[email protected] slots=481,[email protected] [email protected] nodes=2,556,[email protected] pending=22977
INFO [07-23|07:21:00.452] Imported new block headers count=1 elapsed=6.045ms number=12,881,258 hash=793478..ba029d age=1m33s
INFO [07-23|07:21:01.368] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=237.05KiB throttle=1106
INFO [07-23|07:21:03.583] State heal in progress accounts=232,[email protected] slots=482,[email protected] [email protected] nodes=2,558,[email protected] pending=22942
INFO [07-23|07:21:12.387] State heal in progress accounts=232,[email protected] slots=483,[email protected] [email protected] nodes=2,560,[email protected] pending=22521
INFO [07-23|07:21:20.939] State heal in progress accounts=232,[email protected] slots=484,[email protected] [email protected] nodes=2,562,[email protected] pending=22119
INFO [07-23|07:21:29.133] State heal in progress accounts=232,[email protected] slots=484,[email protected] [email protected] nodes=2,564,[email protected] pending=21908
Same result as @mohamedmansour (which also hey, love watchtheburn ❤️ ! happen to be building something v. similar).
If I'm reading the resource use correctly (everything way under capacity, chain data 376GiB large only), but the node count going up, this node is now completely constrained by how quickly it can get the data it needs from other nodes? From the looks of it, I can go down to a machine half this size.
Sounds like what @mohamedmansour and I are seeing is perhaps not entirely expected but unrelated to the bug flagged in this issue. Correct?
Unrelated to the issue, if it helps anyone, the machine is mostly CPU constrained during the sync (fascinating!), sits at 50% RAM use, IO I reserved was waaaaay overkill didn't cross 3000 IOPS, 180MiB/s.
Heh, doing fast sync now, the machine above is averaging 80% CPU, 80% Mem, IO 100 - 250MB/s, network 200Mb/s, it's basically on fire 🔥 . Hats of to the Geth team for the efficiency of snap sync 👌 .
lmk if I should open a new issue if fast sync ends up beating out snap sync 😬 .
I am doubling my cloud now to 20K IOPS and will see how state healing does when it comes to that stage. Bare Metal VMs is going to be the solution perhaps, I wish Geth honesty places logs saying hardware is not good enough if it can detect that.
Snap sync done 👌 . 3 days it looks like. Surprisingly only 384G disk use. Fast sync done too, 355G used. I'm not sure it was faster but in any case, the difference is quite small it seems for these 'medium power' machines.
Seems to work fine for me with the above machine 🙌 .
Snapsync took 1 day on AWS c6g.xlarge machine, while it may run 3 days on the AWS a1.xlarge, which is only 20-30% lower in CPU cores.
4 high-speed (not low end) cpu cores are a bare minimum for the last Geth version.
What is the "target" number of accounts/slots/codes/nodes, currently? Interested to try to estimate my ETA ;-)
FWIW: My last State heal in progress log line was accounts=803,[email protected] slots=1,760,[email protected] [email protected] nodes=8,983,[email protected] pending=0. After that it was Rebuilding state snapshot (which seems to need another ~ 24 hours).
So is there some metric we can look at to understand (absolute) sync progress? Some target number it needs to hit? Which? Thank you!
I believe snap sync requires at least 6 vcpus and ssd with high iops to work.
This might be related:
https://github.com/ethereum/go-ethereum/issues/23191#issuecomment-904306603
Similar problem here ...
I'm ~8 days in now ... the state sync took ~2 days (including some downtimes while I was setting up the rest of the server). The other 6 days it's been state healing.
RaspberryPi ARM-64 (8 GiB RAM, overclocked @ 4 x 2,100 Mhz) Disk: SSD (200+ MiB / sec) Net: 2 Gbit/s
Not sure if relevant, but I had >40,000 of the below messages in the 6 days it's been state healing: WARN [08-22|00:03:07.286] Unexpected trienode heal packet peer=XYZ reqid=XYZ
The Geth folder currently utilizes 400 GiB (no archive).
Is the "accounts" number supposed to just keep increasing? I noticed in the logs that it reset itself to 0 several times in the last 6 days, I think whenenver the node was restarted.
So-what so far:
- 40 thousand "Unexpected trienode heal packets" in 6 days
- Accounts go back to 0 whenever the node is restarted
This is what it shows during shutdown ... I'm not sure why it's not considered a clean shutdown, I send it gentle signals. Just the default stuff that systemd sends.
Failed to journal state snapshot err="snapshot [0xYYYYYYYYYYYYY....YYYYYYYYYYYYYYYYY] missing"
Writing clean trie cache to disk path=/mnt/ssd/geth-data/geth/triecache threads=4
Persisted the clean trie cache path=/mnt/ssd/geth-data/geth/triecache elapsed=42.197ms
Blockchain stopped
Once it resumes, I see this:
[...]
Loaded last fast-sync pivot marker number=13,080,857
Snapshot maintenance disabled (syncing)
Loaded local transaction journal transactions=0 dropped=0
Upgrading chain index type=bloombits percentage=99
Regenerated local transaction journal transactions=0 accounts=0
[...]
Unclean shutdown detected booted=2021-08-xxTxx:xx:xx-xxxx age=6d12h9m
and then it restarts here:
State heal in progress [email protected] [email protected] [email protected] nodes=37,526,[email protected] pending= 1
I'm experiencing this same issue. Same setup as #23191 (comment) (RaspberryPi 8Gb system with SSD) running the latest Etherium on Arm build. Geth version info:
Geth
Version: 1.10.8-stable
Git Commit: 26675454bf93bf904be7a43cce6b3f550115ff90
Git Commit Date: 20210824
Architecture: arm64
Go Version: go1.16.4
Operating System: linux
GOPATH=
GOROOT=go
Using the default snap sync and the log is full of "Unexpected trienode heal packet" entries.
Similar problem here ... I'm ~8 days in now ... the state sync took ~2 days (including some downtimes while I was setting up the rest of the server). The other 6 days it's been state healing. RaspberryPi ARM-64 (8 GiB RAM, overclocked @ 4 x 2,100 Mhz) Disk: SSD (200+ MiB / sec) Net: 2 Gbit/s Not sure if relevant, but I had >40,000 of the below messages in the 6 days it's been state healing: WARN [08-22|00:03:07.286] Unexpected trienode heal packet peer=XYZ reqid=XYZ The Geth folder currently utilizes 400 GiB (no archive).
I switched to fast sync (maintaining the already downloaded data), but it was still not done after 2-3 days. Thus, I rsync'd the whole geth folder to an aarch64 machine and completed the snap sync there, which took 1-2 days. Then I rsync'd everything back and it's running smoothly on the Pi now.
Not ideal, but worked for me.
I run several geth nodes, but I've always used Sync mode fast. I am building a new machine, so I figured I would try snap (the new default). This is running on a HP DL 380 G5 with 24 GB of RAM and a 1TB SSD.
Using SNAP sync, geth will go into these modes of compacting the database and it completely halts the sync process. It does this for over an hour sometimes. But, I let it continue. Now, this morning my sync appears to be stuck in "state heal in progress". It looks like it's been at this for hours. I give up. I am going back to Fast.
How did u go back to fast without abandon the data u have downloaded, whats the command like? thx!
if my state healing looks like t=2021-10-05T13:53:30+0000 lvl=info msg="State heal in progress" [email protected] [email protected] [email protected] nodes=408,[email protected] pending=55412 this nodes should be 8 million+?
@ttibbetts
Raspberry Pi with 4 Core is probably not capable of doing snapshot sync. Try to disable this feature and use regular sync.
If this can help someone I had this issue and in my case it was because I started geth from the config.toml file and the node was overwriting the cache attribute to default. Starting it with --cache xxx --config /path/to/config solved it for me.
Trying to do a snap sync on a recent Ryzen machine (not just a Pi, it should be performant enough) and it seems to be sitting at this for over a day now.
INFO [10-08|17:56:51.721] State heal in progress [email protected] [email protected] [email protected] nodes=1,677,[email protected] pending=1
It seems to have imported receipts and headers to tip (or near to it, at least)
INFO [10-08|17:52:15.561] Imported new block receipts count=27 elapsed=51.550ms number=13,379,325 hash=744aa1..d23c2c age=1h5m56s size=2.88MiB
INFO [10-08|17:52:15.730] Imported new block receipts count=25 elapsed=56.574ms number=13,379,350 hash=56a47b..4a45e8 age=1h2m33s size=2.77MiB
INFO [10-08|17:52:15.887] Imported new block receipts count=19 elapsed=63.894ms number=13,379,369 hash=e15d90..6a1567 age=58m20s size=2.88MiB
INFO [10-08|17:52:16.025] Imported new block receipts count=23 elapsed=52.420ms number=13,379,392 hash=2b93bf..9dc915 age=54m size=2.66MiB
INFO [10-08|17:52:16.302] Imported new block receipts count=79 elapsed=213.982ms number=13,379,471 hash=d8a9e1..1b1d62 age=35m11s size=11.19MiB
The nodes-count is increasing -- it's progressing.
@holiman How high does the nodes-count need to go?
I'm at ".. nodes=86,597,863"
I'd also like to know this.. I'm (sadly) only on nodes=23,591,000. Wondering if I should let the state heal complete or switch to fast sync as it seems others have had success. Thank you for all the hard work you folks put in!!
My Geth 1.10.16 node running snap sync mode also reports regular "State heal in progress" and a frequent stream of repeated "Unexpected trienode heal packet" log messages. It also reports regular "Snapshot extension registration failed" messages. I don't recall seeing these messages when I was running Geth 1.10.15. I switched Geth's sync mode from "fast" to "snap" starting with Geth 1.10.15 because "fast" sync mode was no longer available.
After downgrading Geth from 1.10.16 to 1.10.15, Geth still reports frequent "Snapshot extension registration failed" and "State heal in progress" log messages, but reported no "Unexpected trienode heal packet" log messages until after I pressed Ctrl-c to shut down the node.
I am waiting for more than 1 day after a snap sync, in a beefy machine with beefy internet connection Stuck in "State heal in progress" with no indication of ETA whatsoever What's the point of downloading the state from the peers to speed up the sync, just to waste time doing "State heal" later? 👎
Mar 14 15:47:21 host1 geth[75528]: INFO [03-14|15:47:21.853] State heal in progress accounts=103,[email protected] slots=180,[email protected] [email protected] nodes=901,[email protected] pending=104,371
If this "heal" is gonna "fix" 175M Ethereum accounts (it is now on 103774), this will take more than a month to complete, what exactly is it going to go?
I wish Geth honesty places logs saying hardware is not good enough if it can detect that.
totally concur, if you are providing an app bound to hardware you must benchmark the hardware before you start the app, otherwise your app won't work and since you are the one responsible for providing the app you are also guilty of letting the user running the app on a slow hardware (implicitly). Ethereum team has been giving us low quality engineering since 2015
Mar 14 15:47:21 host1 geth[75528]: INFO [03-14|15:47:21.853] State heal in progress accounts=103,[email protected] slots=180,[email protected] [email protected] nodes=901,[email protected] pending=104,371If this "heal" is gonna "fix" 175M Ethereum accounts (it is now on 103774), this will take more than a month to complete, what exactly is it going to go?
The state heal process is a little deceiving. I was at first worried that my Raspberry Pi 4B + 2TB SSD was having trouble syncing as during state heal the accounts, slots, and nodes were going up slowly relative to how many accounts/slots there are in total.
My state heal finished in about 21 hours at 1,182,357 accounts, 2,327,247 slots, and 12,186,486 nodes. I also had many instances of Unexpected trienode heal packet which didn't seem to affect anything. I have seen other people finish at numbers larger than those so I think its very dependant on your own sync, but you definitely don't need to reach 175M accounts!
That's good to know @leontodd .
But the question remains - how do I get any indication on the progress/ETA of the state heal progress?
I would like to know this too. On my first attempt geth sudenly reported that something was not in order and that the blockchain had to be rewinded with target=0 (aka "Start From Scratch") ; Which I then did and am now in state heal. Sync started (again) on August 4th, sync was done in three days, state heal runs since August 7th (24 days now).
Output looks like this:
INFO [08-31|08:40:43.441] State heal in progress [email protected] [email protected] [email protected] nodes=12,480,[email protected] pending=18993
INFO [08-31|08:40:43.451] State heal in progress [email protected] [email protected] [email protected] nodes=12,480,[email protected] pending=18993
INFO [08-31|08:40:43.542] Imported new block headers count=1 elapsed=3.267ms number=15,445,101 hash=ee34c1..82f2c5
WARN [08-31|08:40:43.550] Unexpected trienode heal packet peer=4efd3711 reqid=7,328,963,860,793,291,079
WARN [08-31|08:40:43.550] Unexpected trienode heal packet peer=9bf27c44 reqid=934,092,868,083,411,094
WARN [08-31|08:40:43.550] Unexpected trienode heal packet peer=5a070bad reqid=7,559,473,032,996,682,308
WARN [08-31|08:40:43.550] Unexpected trienode heal packet peer=1290f9f1 reqid=7,402,543,419,581,238,141
WARN [08-31|08:40:43.551] Unexpected trienode heal packet peer=a85cfbac reqid=5,676,764,028,462,137,471
WARN [08-31|08:40:43.551] Unexpected trienode heal packet peer=de7b486d reqid=3,356,032,926,609,925,242
WARN [08-31|08:40:43.551] Unexpected trienode heal packet peer=9bd73479 reqid=3,878,834,481,290,902,674
WARN [08-31|08:40:43.551] Unexpected trienode heal packet peer=468924e2 reqid=6,636,285,917,303,600,443
WARN [08-31|08:40:43.553] Unexpected trienode heal packet peer=92adf5a4 reqid=3,533,469,970,467,382,671
WARN [08-31|08:40:43.554] Unexpected trienode heal packet peer=703154b2 reqid=4,629,776,562,703,112,673
WARN [08-31|08:40:43.698] Unexpected trienode heal packet peer=b70c1667 reqid=367,965,809,434,615,351
INFO [08-31|08:40:44.452] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=173.41KiB throttle=1512
WARN [08-31|08:40:44.452] Pivot became stale, moving old=15,444,960 new=15,445,039
INFO [08-31|08:40:44.536] Imported new block receipts count=79 elapsed=83.833ms number=15,445,038 hash=d3de2b..2731b3 age=12m51s size=11.35MiB
INFO [08-31|08:40:44.536] State heal in progress [email protected] [email protected] [email protected] nodes=12,480,[email protected] pending=1
WARN [08-31|08:40:44.812] Unexpected trienode heal packet peer=4883dc3a reqid=3,218,495,347,431,903,237
INFO [08-31|08:40:47.007] Imported new block headers count=1 elapsed=3.586ms number=15,445,102 hash=d8caad..e1f5ef
INFO [08-31|08:40:53.705] Imported new block headers count=1 elapsed=3.250ms number=15,445,103 hash=4eae4e..75c64a
INFO [08-31|08:40:58.106] State heal in progress [email protected] [email protected] [email protected] nodes=12,480,[email protected] pending=273
"pending" normally rises until it is somewhere between 110,000 and 130,000. Then a bunch of "Unexpected trienode heal packages" lines appear, and "pending" is restarted at 1. It is rare that this happens with "pending" being below 20k like above.
I also get lines telling me that geth doesn't like nimbus. (It is running and fully synced. Connection to geth, according to nimbus output, works)
WARN [08-31|08:41:47.076] Merge is configured, but no beacon client seen. Please ensure you have one available before the transition arrives!
What really puzzles me is the output [email protected] [email protected] [email protected] in the state heal progress. I remember that these had much higher numbers on my first try.
And I am really anxious about the disk space used. This goes beyond anything I read online an eth1 node would need:
$ LC_ALL=C du -hsc * 2>/dev/null
628G chaindata
5.0K jwtsecret
512 lightchaindata
512 LOCK
4.5K nodekey
5.7M nodes
512 transactions.rlp
2.8M triecache
628G total
(But it has not been growing much in the last 2 weeks.)
This is my current sync state, and while there is some progress, it has been minimal over the past 3 weeks.
Welcome to the Geth JavaScript console!
instance: Geth/v1.10.23-stable/linux-amd64/go1.19
coinbase: <removed>
at block: 0 (Thu Jan 01 1970 01:00:00 GMT+0100 (CET))
datadir: <removed>
modules: admin:1.0 debug:1.0 engine:1.0 eth:1.0 ethash:1.0 miner:1.0 net:1.0 personal:1.0 rpc:1.0 txpool:1.0 web3:1.0
To exit, press ctrl-d or type exit
> eth.syncing
{
currentBlock: 15445102,
healedBytecodeBytes: 213959,
healedBytecodes: 42,
healedTrienodeBytes: 6573144668,
healedTrienodes: 12481207,
healingBytecode: 0,
healingTrienodes: 0,
highestBlock: 15445199,
startingBlock: 15442752,
syncedAccountBytes: 44351333509,
syncedAccounts: 229300287,
syncedBytecodeBytes: 4036242554,
syncedBytecodes: 622973,
syncedStorage: 788616132,
syncedStorageBytes: 168788862624
}
I am really thinking about starting from scratch...
Sync started (again) on August 4th, sync was done in three days, state heal runs since August 7th (24 days now).
If it's been trying to heal for 24 days, then it's not going to succeed. The thing is, with this phase, that it has a nearly accurate state trie, with some errors in it. And it starts trying to patch it up. The errors are due to the block progression during the earlier phases, so if the first part wad done "in three days", then basically it's three days of state changes that needs to be applied to the state trie.
Now, while the state is healing, blocks are also progressed. So the state keeps moving. Therefore, every ~192 blocks or so, we need to move the 'pivot point' along, to target a more recent state.
WARN [08-31|08:40:44.452] Pivot became stale, moving old=15,444,960 new=15,445,039
Whenever we do so, we need to start from the root again, and go downwards while healing the trie. That is why pending becomes zero, so that's fine and as it should be.
But, if the blockchain state changes more rapidly than you are able to heal, then you will never catch up. These are the things that factor in:
- The amount of gas spent in each block (more gas == more changes)
- The speed of your disk (healing the trie is heavy on disk)
- The speed/latency of your internet connection
Now, while the state is healing, blocks are also progressed. So the state keeps moving. Therefore, every ~192 blocks or so, we need to move the 'pivot point' along, to target a more recent state.
WARN [08-31|08:40:44.452] Pivot became stale, moving old=15,444,960 new=15,445,039Whenever we do so, we need to start from the root again, and go downwards while healing the trie. That is why
pendingbecomes zero, so that's fine and as it should be.
Thanks for the explanation!
But, if the blockchain state changes more rapidly than you are able to heal, then you will never catch up. These are the things that factor in:
- The amount of gas spent in each block (more gas == more changes)
- The speed of your disk (healing the trie is heavy on disk)
$ LC_ALL=C dd if=/dev/zero of=.gethdata/test1.img bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.84334 s, 1.3 GB/s
- The speed/latency of your internet connection Using github.com/taganaka/SpeedTest the results in download speed vary between 100 and 380 Mbit/s. Upstream is always about 52Mbit/s. Ping varies between 8-12 ms and jitter is always <5ms (Usually 0).
This all lets me think that there must be something wrong within my database which causes the heal to stall and never to catch up. Best would be to throw it all away I guess... :unamused:
It would be nice if geth could detect such a state, though, and inform the user. Whether that's even possible is another question. :wink:
I ran geth removedb and removed both the local database and ancient database. When resyncing, the state heal was done in under 24 hours.
I tried this again but this time only removed the local database, and kept the ancient database. This time when resyncing the state heal went on for days without completing.
I suspect that there are too many states that need healing when resyncing from the ancient database, especially on slower hardware. This fits with your endless state heal @Yamakuzure as you resynced by rewinding the chain. I don't know much about the technical aspects of the state heal process, but resyncing from scratch has definitely helped in my case.
Possible fix: https://github.com/ethereum/go-ethereum/pull/25651 . People who have reported problems -- did you run with any particular settings for the amount of peers?
I suspect that there are too many states that need healing when resyncing from the ancient database, especially on slower hardware. This fits with your endless state heal @Yamakuzure as you resynced by rewinding the chain. I don't know much about the technical aspects of the state heal process, but resyncing from scratch has definitely helped in my case.
Oh, sorry, I broke off the (super-slow) rewinding, as a target of 0 meant an rm -rf * would do the same anyway, just faster.
I started from scratch (aka empty geth data directory) everytime and never kept the nodes or ancient data directories.
Possible fix: #25651 . People who have reported problems -- did you run with any particular settings for the amount of peers?
No.