go-ethereum
go-ethereum copied to clipboard
State heal phase is very slow (not finished after 2 weeks)
System information
Geth version: geth version
Geth
Version: 1.10.4-stable
Git Commit: aa637fd38a379db6da98df0d520fb1c5139a18ce
Git Commit Date: 20210617
Architecture: amd64
Go Version: go1.16.5
Operating System: linux
GOPATH=
GOROOT=go
OS & Version: Linux
Commit hash : (if develop)
Expected behaviour
geth finishes (default=snap) sync in < 1 day
Actual behaviour
geth is still in 'State heal in progress' after more than 2 weeks
Steps to reproduce the behaviour
Run geth
Logs
https://gist.github.com/xrchz/851eca7ecfb0312422875c90b9a86b2b
I've been eyeing through the logs a bit. I wonder if your IO speed is bad?
Also, there's a lot of this:
WARN [07-03|18:00:40.413] Unexpected trienode heal packet peer=9f0e8fbf reqid=6,915,308,639,612,522,441
It indicates that requests time out. And there's a lot of it. No idea why, but @karalabe added a mechanism to scale the request size depending on the performance, not sure if that is in 1.10.4-stable or not. In a rush, can't check right now
I don't think the IO speed is bad. Here is an indication (taken while geth is also still running):
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite
=randrw --rwmixread=75 --directory=<redacted> --aux-path=<redacted>
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.27
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=136MiB/s,w=45.2MiB/s][r=34.9k,w=11.6k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=450154: Fri Jul 16 12:37:52 2021
read: IOPS=37.9k, BW=148MiB/s (155MB/s)(3070MiB/20719msec)
bw ( KiB/s): min=90808, max=196112, per=100.00%, avg=151902.44, stdev=24036.61, samples=41
iops : min=22702, max=49028, avg=37975.61, stdev=6009.16, samples=41
write: IOPS=12.7k, BW=49.5MiB/s (51.9MB/s)(1026MiB/20719msec); 0 zone resets
bw ( KiB/s): min=30840, max=66952, per=100.00%, avg=50762.73, stdev=8052.88, samples=41
iops : min= 7710, max=16738, avg=12690.68, stdev=2013.22, samples=41
cpu : usr=8.53%, sys=45.38%, ctx=105093, majf=0, minf=8
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=148MiB/s (155MB/s), 148MiB/s-148MiB/s (155MB/s-155MB/s), io=3070MiB (3219MB), run=20719-20719msec
WRITE: bw=49.5MiB/s (51.9MB/s), 49.5MiB/s-49.5MiB/s (51.9MB/s-51.9MB/s), io=1026MiB (1076MB), run=20719-20719msec
Disk stats (read/write):
sda: ios=783034/261889, merge=32/32, ticks=870354/151608, in_queue=1021965, util=99.63%
What is the machine specs?
8GB RAM, 4x 2.2GHz vCPUs (AMD EPYC 7601)
I have the same issue, 2 weeks+ and still not done "State heal in progress" Geth version:
Geth
Version: 1.10.6-stable
Architecture: amd64
Go Version: go1.16.6
Operating System: darwin
GOPATH=
GOROOT=go
logs snipset:
WARN [07-28|10:29:46.515] Pivot became stale, moving old=12,914,818 new=12,914,882
INFO [07-28|10:29:50.427] Imported new block headers count=1 elapsed=253.434ms number=12,914,945 hash=ee1a08..9ce38a
INFO [07-28|10:30:00.224] Imported new block receipts count=64 elapsed=13.703s number=12,914,881 hash=fef964..d789fc age=18m5s size=7.69MiB
INFO [07-28|10:30:18.658] Imported new block headers count=1 elapsed=46.715ms number=12,914,946 hash=7b24c8..2d8006
INFO [07-28|10:30:21.965] State heal in progress accounts=169,[email protected] [email protected] [email protected] nodes=43,293,[email protected] pending=112,626
INFO [07-28|10:30:21.969] State heal in progress accounts=169,[email protected] [email protected] [email protected] nodes=43,293,[email protected] pending=112,626
WARN [07-28|10:30:22.014] Unexpected trienode heal packet peer=cef48f42 reqid=6,500,665,912,563,462,364
WARN [07-28|10:30:22.014] Unexpected trienode heal packet peer=df5f16fe reqid=2,588,354,188,352,030,655
WARN [07-28|10:30:22.081] Unexpected trienode heal packet peer=07527e24 reqid=2,299,545,781,862,247,020
WARN [07-28|10:30:22.097] Unexpected trienode heal packet peer=06fbcb3d reqid=4,616,793,971,286,455,283
WARN [07-28|10:30:22.117] Unexpected trienode heal packet peer=f26bae89 reqid=5,492,659,643,994,326,497
WARN [07-28|10:30:22.150] Unexpected trienode heal packet peer=8fe0dc37 reqid=4,841,696,059,284,633,551
WARN [07-28|10:30:22.166] Unexpected trienode heal packet peer=d8c491e0 reqid=5,089,512,569,912,806,296
WARN [07-28|10:30:22.214] Unexpected trienode heal packet peer=ab94eac9 reqid=289,535,681,122,254,882
INFO [07-28|10:30:29.698] Imported new block headers count=1 elapsed=50.129ms number=12,914,947 hash=502883..e4a4cf
INFO [07-28|10:30:40.282] Imported new block headers count=1 elapsed=68.952ms number=12,914,948 hash=ca7379..feb0e4
INFO [07-28|10:30:42.053] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=246.41KiB throttle=1064
INFO [07-28|10:30:51.755] Imported new block headers count=1 elapsed=455.293ms number=12,914,949 hash=69f8b7..c0be19
INFO [07-28|10:31:23.127] Imported new block headers count=1 elapsed=36.240ms number=12,914,950 hash=1c41d1..32309f
INFO [07-28|10:31:51.161] Imported new block headers count=2 elapsed=303.134ms number=12,914,952 hash=05fedd..bd977f age=1m4s
INFO [07-28|10:31:53.863] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=251.06KiB throttle=1045
INFO [07-28|10:31:58.571] Imported new block headers count=1 elapsed=50.721ms number=12,914,953 hash=ade2c1..0a5bc0
INFO [07-28|10:32:06.890] Imported new block headers count=1 elapsed=635.477ms number=12,914,954 hash=804d5c..ab7b25
INFO [07-28|10:32:10.395] Imported new block headers count=2 elapsed=488.349ms number=12,914,956 hash=1e8a8f..3e8b40
INFO [07-28|10:32:48.525] Imported new block headers count=1 elapsed=80.333ms number=12,914,957 hash=e45180..9b6220
INFO [07-28|10:32:59.615] Imported new block headers count=1 elapsed=47.949ms number=12,914,958 hash=d709af..d8cf88
INFO [07-28|10:33:00.748] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=236.64KiB throttle=1108
INFO [07-28|10:33:07.879] State heal in progress accounts=169,[email protected] [email protected] [email protected] nodes=43,293,[email protected] pending=4369
INFO [07-28|10:33:14.157] Imported new block headers count=1 elapsed=66.366ms number=12,914,959 hash=b48f74..bf44f3 age=1m6s
INFO [07-28|10:33:17.898] Imported new block headers count=1 elapsed=34.263ms number=12,914,960 hash=0bd389..15919a
INFO [07-28|10:33:22.825] State heal in progress accounts=169,[email protected] [email protected] [email protected] nodes=43,293,[email protected] pending=5004
INFO [07-28|10:33:28.418] Imported new block headers count=1 elapsed=36.038ms number=12,914,961 hash=71baf1..463032
INFO [07-28|10:33:35.198] State heal in progress accounts=169,[email protected] [email protected] [email protected] nodes=43,293,[email protected] pending=5484
INFO [07-28|10:33:36.183] Imported new block headers count=1 elapsed=53.880ms number=12,914,962 hash=d3a956..4d2c35
INFO [07-28|10:34:04.801] Imported new block headers count=1 elapsed=49.827ms number=12,914,963 hash=3a9efb..280236
INFO [07-28|10:34:05.954] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=249.31KiB throttle=1052
INFO [07-28|10:34:11.899] Imported new block headers count=1 elapsed=51.650ms number=12,914,964 hash=b03469..eed621
Any clue how much longer based on logs? total number of nodes/etc?
The same issue. IO is more than enough. Geth version
Geth
Version: 1.10.6-stable
Git Commit: 576681f29b895dd39e559b7ba17fcd89b42e4833
Git Commit Date: 20210722
Architecture: amd64
Go Version: go1.16.4
Operating System: linux
GOPATH=
GOROOT=go
machine
CPU: AMD EPYC 7R32 @ 8x 3.224GHz
RAM: 3485MiB / 15853MiB
eth_syncing
{
"jsonrpc": "2.0",
"id": "curltest",
"result": {
"currentBlock": "0xc5a113",
"highestBlock": "0xc5a15d",
"knownStates": "0x2b8c155a",
"pulledStates": "0x2b8c155a",
"startingBlock": "0xc595dd"
}
}
The difference between currentBlock and currentBlock is always about 50~150.
@bFanek Your node is downloading 2 trie nodes per second (1 trie node is 500 bytes). On a different log, it says writing 7MB to the database took 14 seconds. Something's very off there. What kind of machine are you running Geth on?
@kraymond37 You'll need to share some logs. How fast are trie nodes being downloaded?
@xrchz The logs you shared show the initial snap sync, even at that, only the beginning. You say you've been running state heal for weeks, but no such thing appears in the logs you've shared.
@xrchz The logs you shared show the initial snap sync, even at that, only the beginning. You say you've been running state heal for weeks, but no such thing appears in the logs you've shared.
@karalabe Gist automatically truncates large files. You need to click "view the full file" at the top of the gist page to see it all.
Oh, doh. Apologies.
On Tue, Aug 3, 2021 at 4:36 PM Ramana Kumar @.***> wrote:
@xrchz https://github.com/xrchz The logs you shared show the initial snap sync, even at that, only the beginning. You say you've been running state heal for weeks, but no such thing appears in the logs you've shared.
@karalbe Gist automatically truncates large files. You need to click "view the full file" at the top of the gist page to see it all.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ethereum/go-ethereum/issues/23191#issuecomment-891853015, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA7UGOTSIPXIMKKEJJFEYTT27WFLANCNFSM5AEREQXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
@karalabe Finally finished. But I don't know why it is so slow. Here is the log.
I've got the same issue. Syncing mainnet and rinkeby, both nodes have been running state heal for 2-3 weeks now.
Is there a resolution to this?
Similar problem here ...
I'm ~8 days in now ... the state sync took ~2 days (including some downtimes while I was setting up the rest of the server). The other 6 days it's been state healing.
RaspberryPi ARM-64 (8 GiB RAM, overclocked @ 4 x 2,100 Mhz) Disk: SSD (200+ MiB / sec) Net: 2 Gbit/s
Not sure if relevant, but I had >40,000 of the below messages in the 6 days it's been state healing:
WARN [08-22|00:03:07.286] Unexpected trienode heal packet peer=XYZ reqid=XYZ
The Geth folder currently utilizes 400 GiB (no archive).
FYI, I posted more detail in the other ticket: https://github.com/ethereum/go-ethereum/issues/23228
EDIT:
I switched to fast sync (maintaining the already downloaded data), but it was still not done after 2-3 days. Thus, I rsync'd the whole geth folder to an aarch64 machine and completed the snap sync there, which took 1-2 days. Then I rsync'd everything back and it's running smoothly on the Pi now.
Not ideal, but worked for me.
@xrchz Have u solved this problem? Im facing the same problem as yours. Can u tell me whats going on in the following days? thx! (Or can i switch to the fast mode without delet the data i have downloaded? )
No I have not solved it. I stopped trying to use geth myself and switched to using full node service providers instead.
No I have not solved it. I stopped trying to use geth myself and switched to using full node service providers instead.
sad news :(
I don't know much about full node service providers, which one did u choose? My purpose is to get full node data of ethereum by syncing with geth, and then use my modified geth client for scientific tests, do u think full node service providers works for me? Can i get the same "datadir" folder using it? Thx again!!
try the other sync options maybe one works better
try the other sync options maybe one works better
Can i just interrupt the snapsync with ctr+c and use fastsync to continue with the same data folder? (Don't want to lose what i have got the last week)
try the other sync options maybe one works better
Can i just interrupt the snapsync with ctr+c and use fastsync to continue with the same data folder? (Don't want to lose what i have got the last week)
When I did this recently, it re-used the already downloaded chaindata (~470 GiB). Not sure about the rest.
try the other sync options maybe one works better
Can i just interrupt the snapsync with ctr+c and use fastsync to continue with the same data folder? (Don't want to lose what i have got the last week)
When I did this recently, it re-used the already downloaded chaindata (~470 GiB). Not sure about the rest.
thx! BTW i noticed resync didnt work for u right? 😂
try the other sync options maybe one works better
Can i just interrupt the snapsync with ctr+c and use fastsync to continue with the same data folder? (Don't want to lose what i have got the last week)
When I did this recently, it re-used the already downloaded chaindata (~470 GiB). Not sure about the rest.
thx! BTW i noticed resync didnt work for u right? 😂
I didn't have time to test a lot of things, so I just moved the downloaded chaindata to an x86-64 machine and did a snap sync there ... which worked. Then I transferred it back to the ARM and it's been running smoothly since.
How can i increase the throttle?
Hello everyone, as i faced this issue i started my geth with inaccessible ancient store, cause of faulty permissions.
The snap sync was used per default. While running i did another mistake and my system partition runs out of disk space. After this got solved, the 4GB of RAM exceeded and the swap failed, system halts.
So we can expect a corrupted database at this point?
After this got fixed i observed the behaviour described, a lot of "Unexpected trienode heal packet".
My Port Forwarding was also missing, but this seems not to fix it yet.
IPv6 was enabled, i disabled it and started with "--syncmode fast".
The "Unexpected trienode heal packet" disappeared from logs and its syncing.
System: AMD Ryzen in a Virtualbox attached to a USB 3.0 nVME running Ubuntu Server 20 (4vCPU, 6GB RAM, 6GB swap)
Hi.
A 8 GB Raspberry Pi 4 is able to complete a snap sync in less than 2 days. Please see this:
https://twitter.com/EthereumOnARM/status/1463796036103770114
And check these points:
https://twitter.com/EthereumOnARM/status/1463828024080707585
Hi I've also been trying to snap sync a geth node in aws and have also been stuck in state healing for a few weeks. I've synced nodes locally before within a few days so this was surprising to me. I'm using nvme storage on the aws node so I would expect it has the iops to sync.
A log snippet:
INFO [03-08|21:10:17.620] State heal in progress accounts=1,549,[email protected] slots=4,125,[email protected] [email protected] nodes=163,743,[email protected] pending=85071
WARN [03-08|21:10:17.622] Unexpected trienode heal packet peer=73755973 reqid=4,134,898,869,596,324,111
WARN [03-08|21:10:17.633] Unexpected trienode heal packet peer=25f05f19 reqid=7,011,098,481,708,786,349
WARN [03-08|21:10:17.671] Unexpected trienode heal packet peer=9c8ce4ee reqid=7,218,003,286,183,975,893
WARN [03-08|21:10:17.674] Unexpected trienode heal packet peer=bcddcc5d reqid=1,911,971,051,676,565,964
INFO [03-08|21:10:22.508] Imported new block headers count=1 elapsed=8.076ms number=14,348,482 hash=de4c70..134896
INFO [03-08|21:10:31.658] Imported new block headers count=1 elapsed=7.743ms number=14,348,483 hash=809a3e..684cf6
INFO [03-08|21:10:33.379] State heal in progress accounts=1,549,[email protected] slots=4,125,[email protected] [email protected] nodes=163,743,[email protected] pending=85505
INFO [03-08|21:10:47.765] State heal in progress accounts=1,550,[email protected] slots=4,125,[email protected] [email protected] nodes=163,743,[email protected] pending=85906
I have seen others have solved this by switching to fast sync mode but this is no longer an option on the geth version i'm using (1.10.16-stable).
Geth on my Raspberry Pi 4 8 GB has also continuously been reporting "State heal in progress" and frequent "Unexpected trienode heal packet" log messages for a few weeks after I deleted the Geth database and restarted the sync from the beginning. What do these messages mean? Is my Geth node still syncing? Should I bother to continue running my Geth node until this issue is resolved?
Had the same and I decided to synch on a different (faster) computer instead ... I then rsync'd the blockchain data to the RPi after the sync was complete. Not an ideal solution, but worked. Might have mentioned this before somewhere in the tickets here, sorry if that's the case.
I'm also struggling with this problem, and I'm also running Geth on a Raspberry PI 4 and an external SSD.
As I just found out, my SSD has quite low IOPS (I'm using WD Blue 1Tb). I'm following these instructions to test it: https://docs.rocketpool.net/guides/node/local/prepare-pi.html#testing-the-ssd-s-performance
Not sure what's the cause. Maybe it's an outdated firmware or a bad SATA-to-USB adapter. Anyway, it's not a Geth bug in my case.
I'm also struggling with this problem, and I'm also running Geth on a Raspberry PI 4 and an external SSD.
As I just found out, my SSD has quite low IOPS (I'm using WD Blue 1Tb). I'm following these instructions to test it: https://docs.rocketpool.net/guides/node/local/prepare-pi.html#testing-the-ssd-s-performance
I'm also using a WD Blue 1 TB with my Raspberry Pi 4 8 GB and its IO performance (read: IOPS=2284, BW=9139KiB/s, write: IOPS=763, BW=3054KiB/s) is much lower than that which the Rocketpool guide recommends. However, Geth was running without errors or warnings on this hardware until a few point versions prior to 1.10.16. Why would it encounter these problems now?
My node has finally finished syncing! It took about 5 days, with the state healing phase taking about half of that.
My setup:
- Raspberry Pi 4 Model B Rev 1.5, 8 Gb.
- WD Blue 1Tb connected via a cheap SATA-to-USB adapter. IOPS and bandwidth seem to be low (geth was not running during the test):
test: (groupid=0, jobs=1): err= 0: pid=1820: Fri Mar 25 13:19:16 2022
read: IOPS=3043, BW=11.9MiB/s (12.5MB/s)(3070MiB/258259msec)
bw ( KiB/s): min= 4728, max=14672, per=99.99%, avg=12170.61, stdev=1868.41, samples=516
iops : min= 1182, max= 3668, avg=3042.62, stdev=467.11, samples=516
write: IOPS=1017, BW=4068KiB/s (4166kB/s)(1026MiB/258259msec); 0 zone resets
bw ( KiB/s): min= 1544, max= 5104, per=99.99%, avg=4067.59, stdev=651.52, samples=516
iops : min= 386, max= 1276, avg=1016.85, stdev=162.86, samples=516
cpu : usr=3.86%, sys=24.14%, ctx=1048480, majf=0, minf=19
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=11.9MiB/s (12.5MB/s), 11.9MiB/s-11.9MiB/s (12.5MB/s-12.5MB/s), io=3070MiB (3219MB), run=258259-258259msec
WRITE: bw=4068KiB/s (4166kB/s), 4068KiB/s-4068KiB/s (4166kB/s-4166kB/s), io=1026MiB (1076MB), run=258259-258259msec
I'd say that the first phase (blocks downloading) is limited by network. I didn't have many peers so it took about 2.5 days to finish. Incoming traffic was between 2 and 4 Mb/s. The state healing phase seems to be limited by CPU and IOPS. My RPI is not overclocked (some recommend overclocking it), and average read IOPS had been around 1.3K during the phase (seems underperfomed? maybe because of CPU?). Read bandwidth had been around 11MiB/s, which matches the number from the test.
I posted some statistics to my Twitter: https://twitter.com/jeiwan7/status/1507565655079686145
These were last "State heal in progress" lines in logs (the node was restarted a couple of times):
accounts=470,[email protected] slots=883,[email protected] [email protected] nodes=40,140,[email protected] pending=10152
accounts=470,[email protected] slots=884,[email protected] [email protected] nodes=40,141,[email protected] pending=8943
accounts=470,[email protected] slots=884,[email protected] [email protected] nodes=40,143,[email protected] pending=7697
accounts=470,[email protected] slots=885,[email protected] [email protected] nodes=40,145,[email protected] pending=6533
accounts=470,[email protected] slots=886,[email protected] [email protected] nodes=40,148,[email protected] pending=4126
accounts=470,[email protected] slots=886,[email protected] [email protected] nodes=40,149,[email protected] pending=2891
accounts=470,[email protected] slots=887,[email protected] [email protected] nodes=40,151,[email protected] pending=149
accounts=470,[email protected] slots=887,[email protected] [email protected] nodes=40,151,[email protected] pending=0
Seems that pending needs to get down to 0 for the phase to be finished.
Geth
Version: 1.10.16-stable
Git Commit: 20356e57b119b4e70ce47665a71964434e15200d
Git Commit Date: 20220215
Architecture: arm64
Go Version: go1.17.5
Operating System: linux
GOPATH=
GOROOT=go
Hi, I'm experiencing the same problem now. I had an unclean shutdown three days ago, node was offline for two hours, and it seems that it has not yet recovered.
- Still
State heal in progress - Lots of
Unexpected trienode heal packet pendingseems to go up every time
I have been running geth on this machine for more than a year. SSD stats:
READ: bw=112MiB/s (118MB/s), 112MiB/s-112MiB/s (118MB/s-118MB/s), io=3070MiB (3219MB), run=27321-27321msec
WRITE: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1026MiB (1076MB), run=27321-27321msec
geth version 1.10.16-stable
Is there something I should do to speed it up? Maybe increase bandwidth size?
I'm running geth with default values for everything:
geth --mainnet --datadir ... --http --http.api eth,net,web3
You need v1.10.19 or later in order to sync.
Thanks, I upgraded the geth version but the problem still remains.
Thanks, I upgraded the geth version but the problem still remains.
@newearthmartin
run iostat -d -x -m 10 command on your Linux box.
The column r/s (read requests per second) must be about 1500-2000 and the latency (column r_await) must be about 80 microseconds , i.e. 0.08)
If it is lower, then you are going to state-heal for weeks
Thanks, I upgraded the geth version but the problem still remains.
@newearthmartin
if it happens that you have got a slow SSD disk, you can accelerate it buying a smaller but fast SSD disk and configure it as caching device for the slow SSD disk via bcache kernel module.
@nuliknol Here are some values from iostat while running geth:
r/s r_await
1980.40 0.32
1953.30 0.33
1732.00 0.32
1654.20 0.35
2095.00 0.33
Does this mean r/s is ok but latency is not? Any way that I can improve this? I'm using a 2TB WD Blue on a 2016 8GB Asus Zenbook laptop running Ubuntu 22.04.
What I don't understand is that I've been running a full node for a year, why will geth not recover from this? You say weeks... when I downloaded the blockchain, it took less...
Also, should I be looking into Erigon for this hardware? :) I really would like to stay with Geth.
Is there a way to see the progress of the state heal phase and how more it has to go? Also, if I restart geth, do I lose the progress?
Hello, I'm experiencing the same issue.
I ran the Rocketpool test and results are awful
`test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64 fio-3.16 Starting 1 process test: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [m(1)][99.9%][r=276KiB/s,w=88KiB/s][r=69,w=22 IOPS][eta 00m:05s] test: (groupid=0, jobs=1): err= 0: pid=384253: Mon Aug 1 13:14:52 2022 read: IOPS=210, BW=842KiB/s (863kB/s)(3070MiB/3731597msec) bw ( KiB/s): min= 8, max=81176, per=100.00%, avg=879.52, stdev=3872.75, samples=7137 iops : min= 2, max=20294, avg=219.84, stdev=968.19, samples=7137 write: IOPS=70, BW=282KiB/s (288kB/s)(1026MiB/3731597msec); 0 zone resets bw ( KiB/s): min= 8, max=27040, per=100.00%, avg=294.19, stdev=1298.58, samples=7130 iops : min= 2, max= 6760, avg=73.50, stdev=324.65, samples=7130 cpu : usr=0.12%, sys=0.43%, ctx=76826, majf=0, minf=8 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs): READ: bw=842KiB/s (863kB/s), 842KiB/s-842KiB/s (863kB/s-863kB/s), io=3070MiB (3219MB), run=3731597-3731597msec WRITE: bw=282KiB/s (288kB/s), 282KiB/s-282KiB/s (288kB/s-288kB/s), io=1026MiB (1076MB), run=3731597-3731597msec
Disk stats (read/write): sda: ios=819211/295271, merge=6718/35856, ticks=151863026/55438750, in_queue=208094988, util=97.34% `
This strikes a bit strange to me since I had previously synced geth successfully on this drive.
This are the specs for the SSD
Device Model: HP SSD S650 1920GB Serial Number: HBSA51350100117 Firmware Version: T1030A0 User Capacity: 1 920 383 410 176 bytes [1.92 TB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 T13/2161-D revision 4 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Aug 1 13:52:08 2022 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 254 (maximum performance) Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, frozen [SEC2] Wt Cache Reorder: Unavailable
I don't know how to proceed. Maybe there's some erroneous configuration on my part. I'm running geth 1.10.20
Hello @Jeiwan !
Thanks for your help. I was happy to see that you could sync using a Raspberry Pi. I have a much more powerful equipment and was about to give up.
nodes=40,151,[email protected]
For me, your message showing the total nodes when you finished to sync was the most useful here. All I wanted to know is if my syncing was progressing or not.
Can you update us with the total number of nodes today?
I am still syncing. I hope you're not much further than 40,151,717. I am at about 8 million nodes now.
Thanks. :)
Hi @machinasapiens!
I don't have access to my RPI at the moment so I cannot check that. I had multiple full re-syncs on that same RPI and SSD and each of them went fine.
As to the number of nodes, I think it gets reset after each restart (but progress is not lost).
As far as I understand the synchronization process, there are two stages: during the first stage, it downloads block headers; during the second stage, it reconstruct the account state. The length of the latter cannot be estimated because the account state is a Patricia Trie (which gets updated while your node is syncing). So you just need to wait 🤷♂️ If it keeps utilizing CPU and memory, everything's fine.
Hi @machinasapiens!
I don't have access to my RPI at the moment so I cannot check that. I had multiple full re-syncs on that same RPI and SSD and each of them went fine.
As to the number of nodes, I think it gets reset after each restart (but progress is not lost).
As far as I understand the synchronization process, there are two stages: during the first stage, it downloads block headers; during the second stage, it reconstruct the account state. The length of the latter cannot be estimated because the account state is a Patricia Trie (which gets updated while your node is syncing). So you just need to wait 🤷♂️ If it keeps utilizing CPU and memory, everything's fine.
Excellent information. Thanks a lot. 🙌