erigon icon indicating copy to clipboard operation
erigon copied to clipboard

Memory limits? (how to prevent OOM kill?)

Open mizu-eth opened this issue 2 years ago • 25 comments

Hi again,

I've noticed Erigon swapping at times so I tried disabling my swap entirely. However, this has resulted in Erigon being OOM killed after a few days of syncing. I then tried disabling memory overcommitment system-wide (sysctl vm.overcommit_memory=2) but this is much worse still since it results in Erigon crashing after one or two minutes with the message fatal error: runtime: out of memory (and some very long stack traces). This is during initial sync, during the Execution phase (7/16).

I have 16GB of RAM + 8GB of swap and am running a Linux 5 kernel.

So I have a few questions:

  1. How does Erigon detect how much memory it can use? I have tried running it in a docker container with mem_limit and memswap_limit set to 8g but this just results in Erigon being OOM-killed a few minutes into block execution once total system memory usage approaches 9GB (I only have Erigon running, excepting system daemons). So it almost seems like Erigon tries to detect hardware memory limits.
  2. Would it make sense to add a command-line option to explicitly limit memory usage?
  3. Are there any practical steps I can take right now to limit Erigon's memory usage so that it doesn't get OOM-killed?

mizu-eth avatar Sep 21 '22 22:09 mizu-eth

probably fixed by https://github.com/ledgerwatch/erigon/pull/5253

AskAlexSharov avatar Sep 22 '22 07:09 AskAlexSharov

  1. yes, we can do it in 90% of cases, but not in 100%

AskAlexSharov avatar Sep 22 '22 07:09 AskAlexSharov

  1. you can:
  • update to by https://github.com/ledgerwatch/erigon/pull/5253
  • hardcode to cmd/erigon.go next line compress.SetDecompressionTableCondensity(6)

AskAlexSharov avatar Sep 22 '22 07:09 AskAlexSharov

Thank you, @AskAlexSharov.

  1. yes, we can do it in 90% of cases, but not in 100%

That would probably still help! At least that way I could prevent Erigon and my CL client from eating each other's RAM space and swapping.

  1. you can:

I tried this. I'm now on the latest devel version (erigon/v2022.09.3-dev-fb0f811f/linux-amd64/go1.19.1) with that commit, but with mem_limit and memswap_limit set to 7g, Erigon still gets OOM-killed after a few minutes in the Execution phase.

* hardcode to `cmd/erigon.go` next line `compress.SetDecompressionTableCondensity(6)`

Unfortunately, I don't understand what this is supposed to do and where in the code to put it :/

mizu-eth avatar Sep 22 '22 09:09 mizu-eth

  • show your logs
  • add --pprof flag, when you see high mem usage - do go tool pprof -inuse_space -png http://127.0.0.1:6060/debug/pprof/heap > mem.png

AskAlexSharov avatar Sep 22 '22 10:09 AskAlexSharov

  • show your logs

There's nothing special in the logs. Here's the full log:

[INFO] [09-22|09:23:46.305] Build info                               git_branch=HEAD git_tag=v2021.10.03-1902-gfb0f811f1-dirty git_commit=fb0f811f19c41d2f005fc6173e4af1e6f01f6319
[INFO] [09-22|09:23:46.305] Starting Erigon on Ethereum mainnet...
[INFO] [09-22|09:23:46.315] Maximum peer count                       ETH=100 total=100
[INFO] [09-22|09:23:46.315] starting HTTP APIs                       APIs=eth,erigon,engine,web3,net,debug,trace,txpool
[INFO] [09-22|09:23:46.315] torrent verbosity                        level=WRN
[INFO] [09-22|09:23:46.570] [torrent] Public IP                      ip=[...]
[INFO] [09-22|09:23:46.570] Set global gas cap                       cap=50000000
[INFO] [09-22|09:23:46.611] Opening Database                         label=chaindata path=/home/erigon/.local/share/erigon/chaindata
[INFO] [09-22|09:23:46.647] Initialised chain configuration          config="{ChainID: 1, Homestead: 1150000, DAO: 1920000, DAO Support: true, Tangerine Whistle: 2463000, Spurious Dragon: 2675000, Byzantium: 4370000, Constan
00, Petersburg: 7280000, Istanbul: 9069000, Muir Glacier: 9200000, Berlin: 12244000, London: 12965000, Arrow Glacier: 13773000, Gray Glacier: 15050000, Terminal Total Difficulty: 58750000000000000000000, Merge Netsplit: <nil
nil>, Cancun: <nil>, Engine: ethash}" genesis=0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
[INFO] [09-22|09:23:46.647] Effective                                prune_flags= snapshot_flags="--snapshots=true" history.v2=false
[INFO] [09-22|09:23:49.798] Initialising Ethereum protocol           network=1
[INFO] [09-22|09:23:49.798] Disk storage enabled for ethash DAGs     dir=/home/erigon/.local/share/erigon/ethash-dags count=2
[INFO] [09-22|09:23:50.547] Starting private RPC server              on=[...]
[INFO] [09-22|09:23:50.547] new subscription to logs established
[INFO] [09-22|09:23:50.547] rpc filters: subscribing to Erigon events
[INFO] [09-22|09:23:50.547] New txs subscriber joined
[INFO] [09-22|09:23:50.547] new subscription to newHeaders established
[INFO] [09-22|09:23:50.547] Reading JWT secret                       path=/home/erigon/.local/share/erigon/jwt.hex
[INFO] [09-22|09:23:50.547] HTTP endpoint opened for Engine API      url=[...] ws=true ws.compression=true
[INFO] [09-22|09:23:50.548] HTTP endpoint opened                     url=[...] ws=false ws.compression=true grpc=false
[INFO] [09-22|09:23:50.562] [1/16 Snapshots] Fetching torrent files metadata
[INFO] [09-22|09:23:50.581] Started P2P networking                   version=66 self=[...] name=erigon/v2022.09.3-dev-fb0f811f/linux-amd64/go1.19.1
[INFO] [09-22|09:23:53.595] [Snapshots] Stat                         blocks=15000k indices=15000k alloc=2.5GB sys=2.6GB
[INFO] [09-22|09:23:53.596] [7/16 Execution] Blocks execution        from=13081565 to=15537393
[INFO] [09-22|09:23:54.220] [txpool] Started
[INFO] [09-22|09:24:13.613] [7/16 Execution] Executed blocks         number=13081942 blk/s=18.8 tx/s=3838.2 Mgas/s=293.8 gasState=0.01 batch=10.9MB alloc=2.4GB sys=4.4GB
[INFO] [09-22|09:24:33.642] [7/16 Execution] Executed blocks         number=13082393 blk/s=22.5 tx/s=4573.9 Mgas/s=350.9 gasState=0.02 batch=25.3MB alloc=3.2GB sys=4.4GB
[INFO] [09-22|09:24:50.548] [txpool] stat                            block=15537393 pending=10000 baseFee=0 queued=1335 alloc=3.6GB sys=4.5GB
[INFO] [09-22|09:24:53.654] [7/16 Execution] Executed blocks         number=13082877 blk/s=24.2 tx/s=4905.9 Mgas/s=373.5 gasState=0.04 batch=39.7MB alloc=4.0GB sys=4.5GB
[INFO] [09-22|09:25:13.617] [7/16 Execution] Executed blocks         number=13083396 blk/s=26.0 tx/s=4875.8 Mgas/s=403.1 gasState=0.05 batch=55.4MB alloc=2.9GB sys=4.6GB
[INFO] [09-22|09:25:33.599] [7/16 Execution] Executed blocks         number=13083908 blk/s=25.6 tx/s=4717.7 Mgas/s=398.3 gasState=0.07 batch=70.6MB alloc=3.9GB sys=4.6GB
[INFO] [09-22|09:25:49.799] [p2p] GoodPeers                          eth66=9
[INFO] [09-22|09:25:50.549] [txpool] stat                            block=15537393 pending=10000 baseFee=0 queued=2710 alloc=2.4GB sys=4.7GB
[INFO] [09-22|09:25:53.623] [7/16 Execution] Executed blocks         number=13084438 blk/s=26.5 tx/s=4602.9 Mgas/s=408.0 gasState=0.08 batch=85.4MB alloc=2.9GB sys=4.7GB
[INFO] [09-22|09:26:13.598] [7/16 Execution] Executed blocks         number=13085057 blk/s=31.0 tx/s=5410.8 Mgas/s=478.5 gasState=0.10 batch=104.0MB alloc=4.4GB sys=4.8GB
[INFO] [09-22|09:26:33.605] [7/16 Execution] Executed blocks         number=13085677 blk/s=31.0 tx/s=6393.9 Mgas/s=481.0 gasState=0.12 batch=122.7MB alloc=4.0GB sys=4.9GB
[INFO] [09-22|09:26:50.549] [txpool] stat                            block=15537393 pending=10000 baseFee=0 queued=3883 alloc=2.8GB sys=4.9GB
[INFO] [09-22|09:26:53.613] [7/16 Execution] Executed blocks         number=13086301 blk/s=31.2 tx/s=5814.5 Mgas/s=480.4 gasState=0.13 batch=142.0MB alloc=3.4GB sys=4.9GB
[INFO] [09-22|09:27:13.602] [7/16 Execution] Executed blocks         number=13086913 blk/s=30.6 tx/s=6224.8 Mgas/s=476.2 gasState=0.15 batch=161.5MB alloc=2.6GB sys=5.0GB
[INFO] [09-22|09:27:33.645] [7/16 Execution] Executed blocks         number=13087592 blk/s=33.9 tx/s=7034.0 Mgas/s=518.7 gasState=0.17 batch=183.8MB alloc=4.6GB sys=5.1GB
[INFO] [09-22|09:27:49.799] [p2p] GoodPeers                          eth66=11
[INFO] [09-22|09:27:50.551] [txpool] stat                            block=15537393 pending=10000 baseFee=0 queued=5831 alloc=3.2GB sys=5.2GB
[INFO] [09-22|09:27:53.627] [7/16 Execution] Executed blocks         number=13088276 blk/s=34.2 tx/s=6771.8 Mgas/s=534.1 gasState=0.19 batch=206.1MB alloc=3.9GB sys=5.2GB
[INFO] [09-22|09:28:13.644] [7/16 Execution] Executed blocks         number=13088953 blk/s=33.8 tx/s=6580.3 Mgas/s=523.4 gasState=0.21 batch=229.3MB alloc=3.3GB sys=5.2GB
[INFO] [09-22|09:28:33.623] [7/16 Execution] Executed blocks         number=13089668 blk/s=35.8 tx/s=6767.1 Mgas/s=554.2 gasState=0.23 batch=251.1MB alloc=2.9GB sys=5.3GB
[INFO] [09-22|09:28:50.547] [txpool] stat                            block=15537393 pending=10000 baseFee=0 queued=8280 alloc=3.0GB sys=5.3GB
[INFO] [09-22|09:28:53.625] [7/16 Execution] Executed blocks         number=13090179 blk/s=25.5 tx/s=4340.5 Mgas/s=392.1 gasState=0.24 batch=264.4MB alloc=3.4GB sys=5.3GB
[INFO] [09-22|09:29:13.616] [7/16 Execution] Executed blocks         number=13090682 blk/s=25.2 tx/s=4514.6 Mgas/s=389.8 gasState=0.26 batch=279.8MB alloc=4.2GB sys=5.3GB
[INFO] [09-22|09:29:33.624] [7/16 Execution] Executed blocks         number=13091173 blk/s=24.5 tx/s=4380.7 Mgas/s=376.7 gasState=0.27 batch=294.9MB alloc=4.8GB sys=5.3GB
[INFO] [09-22|09:29:49.799] [p2p] GoodPeers                          eth66=15
[INFO] [09-22|09:29:50.550] [txpool] stat                            block=15537393 pending=10000 baseFee=0 queued=10609 alloc=4.4GB sys=5.3GB
[INFO] [09-22|09:29:53.686] [7/16 Execution] Executed blocks         number=13091595 blk/s=21.0 tx/s=3754.1 Mgas/s=324.7 gasState=0.28 batch=308.0MB alloc=4.9GB sys=5.3GB
[INFO] [09-22|09:30:13.631] [7/16 Execution] Executed blocks         number=13092089 blk/s=24.8 tx/s=4787.4 Mgas/s=385.5 gasState=0.30 batch=322.6MB alloc=3.0GB sys=5.4GB
[INFO] [09-22|09:30:33.655] [7/16 Execution] Executed blocks         number=13092555 blk/s=23.3 tx/s=4164.0 Mgas/s=357.0 gasState=0.31 batch=336.4MB alloc=3.4GB sys=5.4GB
[INFO] [09-22|09:30:50.548] [txpool] stat                            block=15537393 pending=10000 baseFee=0 queued=14204 alloc=3.5GB sys=5.4GB
[INFO] [09-22|09:30:53.605] [7/16 Execution] Executed blocks         number=13093029 blk/s=23.8 tx/s=4280.9 Mgas/s=366.0 gasState=0.32 batch=351.2MB alloc=3.9GB sys=5.4GB
erigon exited with code 137

Here's the relevant part of the kernel log:

erigon invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
CPU: 1 PID: 2138533 Comm: erigon Not tainted 5.10.0-18-amd64 #1 Debian 5.10.140-1
Hardware name: [...]
Call Trace:
 dump_stack+[...]
 dump_header+[...]
 oom_kill_process.cold+[...]
 out_of_memory+[...]
 mem_cgroup_out_of_memory+[...]
 try_charge+[...]
 ? __alloc_pages_nodemask+[...]
 mem_cgroup_charge+[...]
 handle_mm_fault+[...]
 do_user_addr_fault+[...]
 exc_page_fault+[...]
 ? asm_exc_page_fault+[...]
 asm_exc_page_fault+[...]
[...]
memory: usage 7340032kB, limit 7340032kB, failcnt 557044
[...]
kernel: Memory cgroup out of memory: Killed process 2138343 (erigon) total-vm:15531473176kB, anon-rss:6987456kB, file-rss:4186596kB, shmem-rss:0kB, UID:1000 pgtables:291232kB oom_score_adj:0

mizu-eth avatar Sep 22 '22 10:09 mizu-eth

add --pprof flag, when you see high mem usage - do go tool pprof -inuse_space -png http://127.0.0.1:6060/debug/pprof/heap > mem.png

I can try this but the memory usage isn't unusually high so not sure what it will show.

mizu-eth avatar Sep 22 '22 10:09 mizu-eth

To reiterate: I did restrict the container's memory to 7GB for this test, otherwise I would have to wait potentially days to be able to reproduce the OOM kill on my full 16GB of RAM.

mizu-eth avatar Sep 22 '22 10:09 mizu-eth

Reduce —batchSize

AskAlexSharov avatar Sep 22 '22 11:09 AskAlexSharov

Reduce —batchSize

Thank you very much. Although this does not appear to provide strict guarantees as to max memory usage, it does help a lot in practice. I set --batchSize=32M (instead of the default of 256M Ggas) and Erigon is now consuming less than 9GB at all times whereas with the default setting, it could regularly consume up to 12GB (executing blocks near 13,500,000).

An option to finalise the batch at a certain memory threshold (as opposed to gas threshold) might be even better, but --batchSize will probably satisfy my needs :)

I seem to be getting an average of around 19.8 blocks/s with --batchSize=256M and 19.5 blocks/s with --batchSize=32M (and of course, I realise this tiny difference between the two speeds is not statistically meaningful). I calculated those speeds myself over a 1 to 3 hour period, based on timestamps and number= in the logs; this isn't the blk/s= log output. So I'm wondering if there really is that great of an advantage to using a larger batch size vs 32M (which results in a batch being written every few minutes). If you don't mind me asking, was there any particular reason for choosing 256M as the default?

mizu-eth avatar Sep 22 '22 17:09 mizu-eth

  1. It depends on your disk speed
  2. It also may depend on - current state size (progress). bigger state - more random reads.
  3. Yes, we need choose batchSize depend on available memory by default.

AskAlexSharov avatar Sep 23 '22 01:09 AskAlexSharov

I see, thanks. I don't know if the current OOM kill situation is expected behaviour (I don't really know how it could be prevented in the general case, but you know better than me). So I won't close this issue but feel free to do so yourself of course.

mizu-eth avatar Sep 23 '22 05:09 mizu-eth

btw, I'm seeing what seems like a similar situation when trying to run erigon on a RasberryPi with 8gb ram. I have swap setup so erigon never gets killed, but it goes so far into using swap that it crawls to an halt.

Params:

--chain mainnet --blockDownloaderWindow 32768 --prune=htc --prune.r.before=11052984 --authrpc.jwtsecret=/etc/ethereum/jwtsecret --datadir /home/ethereum/.erigon --private.api.addr= --authrpc.addr 0.0.0.0 --authrpc.vhosts=xxx

Memory over 24 hours looks like this: image

Logs are generally un-interesting, it gets stuck in 7/16 executing I'm running with --pprof right now and going to see if I can catch high memory usage.

I tried --batchSize 32M but it didn't seem to make a difference, maybe I need a smaller value for 8gb ram?

okjustgo avatar Sep 28 '22 02:09 okjustgo

Try smaller

AskAlexSharov avatar Sep 28 '22 03:09 AskAlexSharov

Also you can try add to cmd/erigon/main.go line compress.SetDecompressionTableCondensity(6)

and add --txpool.disable

AskAlexSharov avatar Sep 28 '22 03:09 AskAlexSharov

Also you can try add to cmd/erigon/main.go line compress.SetDecompressionTableCondensity(6)

and add --txpool.disable

hmm, I don't see compress.SetDecompressionTableCondensity(6) in https://github.com/ledgerwatch/erigon/blob/stable/cmd/erigon/main.go or https://github.com/ledgerwatch/erigon/blob/devel/cmd/erigon/main.go

okjustgo avatar Sep 28 '22 03:09 okjustgo

there is none. you can add this line.

AskAlexSharov avatar Sep 28 '22 03:09 AskAlexSharov

got it, does it matter where? I'd probably add it above these statements:

https://github.com/ledgerwatch/erigon/blob/c6a000997e39b4bb89a2b73403dbdf9fb16780c0/cmd/erigon/main.go#L56-L58

okjustgo avatar Sep 28 '22 03:09 okjustgo

Try smaller

Tried --batchSize 24M doesn't seem like it slowed down the time it takes to dip into swap at all. Idk if there is a way to confirm the setting got set properly. If I get time I can try the other approaches later

Here is the pprof output for --batchSize 24M @ 70% swap usage

mem-24M-70

Logs at that time were:

Sep 28 04:04:34 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:04:34.082] [7/16 Execution] Executed blocks         number=2436561 blk/s=15.7 tx/s=91.1 Mgas/s=4.0 gasState=0.41 batch=65.2MB alloc=9.3GB sys=9.8GB
Sep 28 04:04:53 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:04:53.910] [7/16 Execution] Executed blocks         number=2437003 blk/s=22.4 tx/s=162.6 Mgas/s=6.1 gasState=0.42 batch=81.4MB alloc=10.0GB sys=10.6GB
Sep 28 04:05:11 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:05:11.110] [p2p] GoodPeers                          eth66=84
Sep 28 04:05:11 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:05:11.167] [txpool] stat                            block=14999999 pending=559 baseFee=0 queued=2827 alloc=10.8GB sys=11.4GB
Sep 28 04:05:13 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:05:13.962] [7/16 Execution] Executed blocks         number=2437451 blk/s=22.2 tx/s=177.2 Mgas/s=7.1 gasState=0.42 batch=101.4MB alloc=10.9GB sys=11.5GB
Sep 28 04:05:33 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:05:33.937] [7/16 Execution] Executed blocks         number=2437947 blk/s=24.8 tx/s=192.1 Mgas/s=7.8 gasState=0.42 batch=121.0MB alloc=11.9GB sys=12.6GB
Sep 28 04:05:53 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:05:53.783] [7/16 Execution] Executed blocks         number=2438139 blk/s=9.7 tx/s=96.5 Mgas/s=3.4 gasState=0.42 batch=126.9MB alloc=8.3GB sys=12.9GB
Sep 28 04:06:11 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:06:11.135] [txpool] stat                            block=14999999 pending=559 baseFee=0 queued=2830 alloc=8.6GB sys=12.9GB
Sep 28 04:06:14 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:06:14.638] [7/16 Execution] Executed blocks         number=2438274 blk/s=6.5 tx/s=61.2 Mgas/s=2.2 gasState=0.42 batch=132.3MB alloc=8.6GB sys=12.9GB
Sep 28 04:06:33 ethereumonarm-9a6eeddd5 erigon[256413]: [INFO] [09-28|04:06:33.910] [7/16 Execution] Executed blocks         number=2438433 blk/s=8.3 tx/s=70.0 Mgas/s=2.2 gasState=0.43 batch=136.2MB alloc=9.0GB sys=12.9GB

okjustgo avatar Sep 28 '22 04:09 okjustgo

  1. one red square on picture can be improved by compress.SetDecompressionTableCondensity(6)
  2. another it's --batchSize, try reduce it more. from this part of logs I don't see value of batch=136.2MB before commit

AskAlexSharov avatar Sep 28 '22 07:09 AskAlexSharov

and add --txpool.disable

Just for my understanding, the --txpool.disable will free up some memory during the execution phase, but needs to be turned back on once the node is sycned? Is this correct?

okjustgo avatar Sep 28 '22 15:09 okjustgo

compress.SetDecompressionTableCondensity(6) is working much better now (using alpha branch, devel seems to be broken)

with --batchSize 16M I'm seeing batch hit 64MB before committing memory is staying much lower, I'll increase batchSize and see what I get.

Sep 28 15:55:01 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:55:01.406] [Snapshots] Stat                         blocks=15000k indices=15000k alloc=602.8MB sys=910.1MB
Sep 28 15:55:06 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:55:06.678] [7/16 Execution] Blocks execution        from=4235123 to=14999999
Sep 28 15:55:26 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:55:26.719] [7/16 Execution] Executed blocks         number=4235921 blk/s=39.8 tx/s=2765.6 Mgas/s=138.1 gasState=0.08 batch=4.9MB alloc=846.4MB sys=1.3GB
Sep 28 15:55:46 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:55:46.723] [7/16 Execution] Executed blocks         number=4236496 blk/s=28.7 tx/s=3308.6 Mgas/s=126.7 gasState=0.15 batch=9.7MB alloc=1.1GB sys=1.3GB
Sep 28 15:55:53 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:55:53.428] [txpool] stat                            block=14999999 pending=1209 baseFee=0 queued=6948 alloc=783.2MB sys=1.3GB
Sep 28 15:56:06 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:56:06.683] [7/16 Execution] Executed blocks         number=4237082 blk/s=29.4 tx/s=3577.2 Mgas/s=139.8 gasState=0.24 batch=14.7MB alloc=788.5MB sys=1.3GB
Sep 28 15:56:26 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:56:26.695] [7/16 Execution] Executed blocks         number=4237596 blk/s=25.7 tx/s=4156.7 Mgas/s=155.9 gasState=0.33 batch=20.4MB alloc=1.1GB sys=1.3GB
Sep 28 15:56:46 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:56:46.722] [7/16 Execution] Executed blocks         number=4238137 blk/s=27.0 tx/s=4094.5 Mgas/s=165.7 gasState=0.42 batch=26.6MB alloc=903.2MB sys=1.3GB
Sep 28 15:56:53 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:56:53.412] [p2p] GoodPeers                          eth66=5
Sep 28 15:56:53 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:56:53.433] [txpool] stat                            block=14999999 pending=1209 baseFee=0 queued=6981 alloc=1.2GB sys=1.3GB
Sep 28 15:57:06 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:57:06.707] [7/16 Execution] Executed blocks         number=4238762 blk/s=31.3 tx/s=3947.4 Mgas/s=195.8 gasState=0.54 batch=35.0MB alloc=700.5MB sys=1.3GB
Sep 28 15:57:26 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:57:26.682] [7/16 Execution] Executed blocks         number=4239534 blk/s=38.6 tx/s=4206.2 Mgas/s=193.2 gasState=0.65 batch=42.7MB alloc=1.2GB sys=1.4GB
Sep 28 15:57:46 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:57:46.680] [7/16 Execution] Executed blocks         number=4240216 blk/s=34.1 tx/s=5248.3 Mgas/s=190.9 gasState=0.76 batch=49.9MB alloc=1.0GB sys=1.5GB
Sep 28 15:57:53 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:57:53.426] [txpool] stat                            block=14999999 pending=1206 baseFee=0 queued=7097 alloc=751.1MB sys=1.5GB
Sep 28 15:58:06 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:58:06.685] [7/16 Execution] Executed blocks         number=4240988 blk/s=38.6 tx/s=4716.8 Mgas/s=221.3 gasState=0.89 batch=57.7MB alloc=849.4MB sys=1.5GB
Sep 28 15:58:22 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:58:22.152] Committed State                          gas reached=34366195645 gasTarget=34359738368
Sep 28 15:58:41 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:58:41.735] [7/16 Execution] Executed blocks         number=4241609 blk/s=17.7 tx/s=2304.0 Mgas/s=109.1 gasState=0.00 batch=0B alloc=980.8MB sys=1.8GB
Sep 28 15:58:46 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:58:46.686] [7/16 Execution] Executed blocks         number=4241788 blk/s=36.2 tx/s=4387.8 Mgas/s=208.2 gasState=0.03 batch=1.5MB alloc=1.2GB sys=1.8GB
Sep 28 15:58:53 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:58:53.413] [p2p] GoodPeers                          eth66=12
Sep 28 15:58:53 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:58:53.430] [txpool] stat                            block=14999999 pending=1206 baseFee=0 queued=7103 alloc=730.2MB sys=1.8GB
Sep 28 15:59:06 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:59:06.699] [7/16 Execution] Executed blocks         number=4242663 blk/s=43.7 tx/s=4682.0 Mgas/s=267.7 gasState=0.19 batch=12.8MB alloc=953.5MB sys=1.8GB
Sep 28 15:59:26 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:59:26.681] [7/16 Execution] Executed blocks         number=4243384 blk/s=36.1 tx/s=5239.9 Mgas/s=214.8 gasState=0.31 batch=20.2MB alloc=957.2MB sys=1.8GB
Sep 28 15:59:46 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:59:46.696] [7/16 Execution] Executed blocks         number=4244113 blk/s=36.4 tx/s=5765.0 Mgas/s=233.3 gasState=0.45 batch=28.2MB alloc=961.1MB sys=1.8GB
Sep 28 15:59:54 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|15:59:54.572] [txpool] stat                            block=14999999 pending=1206 baseFee=0 queued=7117 alloc=806.8MB sys=1.8GB
Sep 28 16:00:06 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|16:00:06.687] [7/16 Execution] Executed blocks         number=4244836 blk/s=36.2 tx/s=5302.1 Mgas/s=231.6 gasState=0.58 batch=36.3MB alloc=897.6MB sys=1.8GB
Sep 28 16:00:26 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|16:00:26.688] [7/16 Execution] Executed blocks         number=4245590 blk/s=37.7 tx/s=4829.3 Mgas/s=233.6 gasState=0.72 batch=45.7MB alloc=855.0MB sys=1.8GB
Sep 28 16:00:46 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|16:00:46.697] [7/16 Execution] Executed blocks         number=4246391 blk/s=40.0 tx/s=5129.1 Mgas/s=247.2 gasState=0.86 batch=54.6MB alloc=1.3GB sys=1.8GB
Sep 28 16:00:53 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|16:00:53.412] [p2p] GoodPeers                          eth66=19
Sep 28 16:00:53 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|16:00:53.435] [txpool] stat                            block=14999999 pending=1206 baseFee=0 queued=7188 alloc=1.1GB sys=1.8GB
Sep 28 16:01:06 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|16:01:06.698] [7/16 Execution] Executed blocks         number=4247112 blk/s=36.0 tx/s=4882.3 Mgas/s=225.2 gasState=0.99 batch=63.0MB alloc=1.2GB sys=1.8GB
Sep 28 16:01:07 ethereumonarm-9a6eeddd5 erigon[283197]: [INFO] [09-28|16:01:07.826] Committed State                          gas reached=34359911584 gasTarget=34359738368

okjustgo avatar Sep 28 '22 16:09 okjustgo

Hmm, once I increased to --batchSize 256M and it started writing out to the DB I got this error:

Staged Sync err="runtime error: invalid memory address or nil pointer dereference, snapshot: 4000000-4500000, trace: [block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172], trace: [stageloop.go:135 panic.go:890 block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]

Logs:

Sep 28 16:43:37 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:43:37.981] [7/16 Execution] Executed blocks         number=4349182 blk/s=34.9 tx/s=3439.3 Mgas/s=195.5 gasState=0.99 batch=1.1GB alloc=2.4GB sys=5.1GB
Sep 28 16:43:57 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:43:57.982] [7/16 Execution] Executed blocks         number=4350048 blk/s=43.3 tx/s=5085.9 Mgas/s=253.4 gasState=1.00 batch=1.1GB alloc=3.7GB sys=5.1GB
Sep 28 16:44:09 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:44:09.670] Committed State                          gas reached=549756019262 gasTarget=549755813888
Sep 28 16:44:29 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:44:29.482] [p2p] GoodPeers                          eth66=56
Sep 28 16:44:29 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:44:29.510] [txpool] stat                            block=14999999 pending=3375 baseFee=0 queued=9351 alloc=2.4GB sys=6.5GB
Sep 28 16:44:32 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:44:32.749] Flushed buffer file                      name=/home/ethereum/.erigon/etl-temp/erigon-sortable-buf-3214123939 alloc=2.5GB sys=6.5GB
Sep 28 16:44:50 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:44:50.916] Flushed buffer file                      name=/home/ethereum/.erigon/etl-temp/erigon-sortable-buf-450471080 alloc=2.8GB sys=6.5GB
Sep 28 16:44:50 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:44:50.917] Write to db                              progress=6.0M/19.0M current table=PlainState
Sep 28 16:44:59 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:44:59.828] Flushed buffer file                      name=/home/ethereum/.erigon/etl-temp/erigon-sortable-buf-741370933 alloc=2.9GB sys=6.5GB
Sep 28 16:45:29 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:45:29.498] [txpool] stat                            block=14999999 pending=3375 baseFee=0 queued=9369 alloc=3.5GB sys=6.5GB
Sep 28 16:45:29 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:45:29.830] [] ETL [2/2] Loading                     into=PlainState current key=6090a6e4... alloc=3.5GB sys=6.5GB
Sep 28 16:45:59 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:45:59.830] [] ETL [2/2] Loading                     into=PlainState current key=8d12a197... alloc=4.2GB sys=6.5GB
Sep 28 16:46:29 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:46:29.482] [p2p] GoodPeers                          eth66=59
Sep 28 16:46:29 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:46:29.504] [txpool] stat                            block=14999999 pending=3375 baseFee=0 queued=9578 alloc=2.4GB sys=6.5GB
Sep 28 16:46:29 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:46:29.831] [] ETL [2/2] Loading                     into=PlainState current key=ca8a9168... alloc=2.5GB sys=6.5GB
Sep 28 16:46:59 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:46:59.831] [] ETL [2/2] Loading                     into=PlainState current key=f598e316... alloc=3.0GB sys=6.5GB
Sep 28 16:47:04 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:47:04.174] [] etl: temp files removed               total size=432.3MB
Sep 28 16:47:04 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:47:04.176] Write to db                              progress=7.6M/19.0M current table=Code
Sep 28 16:47:21 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:47:21.941] [7/16 Execution] Executed blocks         number=4350455 blk/s=2.0 tx/s=204.4 Mgas/s=11.6 gasState=0.00 batch=0B alloc=3.6GB sys=6.5GB
Sep 28 16:47:29 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:47:29.499] [txpool] stat                            block=14999999 pending=3373 baseFee=0 queued=9939 alloc=3.9GB sys=6.5GB
Sep 28 16:47:37 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:47:37.971] [7/16 Execution] Executed blocks         number=4350799 blk/s=21.5 tx/s=2183.8 Mgas/s=121.8 gasState=0.00 batch=4.3MB alloc=4.2GB sys=6.5GB
Sep 28 16:47:58 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:47:58.012] [7/16 Execution] Executed blocks         number=4351239 blk/s=22.0 tx/s=2711.0 Mgas/s=125.9 gasState=0.01 batch=9.6MB alloc=1.4GB sys=6.5GB
Sep 28 16:48:11 ethereumonarm-9a6eeddd5 erigon[283293]: [EROR] [09-28|16:48:11.977] RPC method eth_getBlockByNumber crashed: runtime error: invalid memory address or nil pointer dereference, snapshot: 4000000-4500000, trace: [block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:544 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 eth_api.go:169 eth_api.go:149 eth_api.go:246 eth_block.go:359 eth_block.go:194 value.go:584 value.go:368 service.go:222 handler.go:494 handler.go:444 handler.go:392 handler.go:223 handler.go:316 asm_arm64.s:1172]
Sep 28 16:48:11 ethereumonarm-9a6eeddd5 erigon[283293]: [service.go:217 panic.go:884 block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:544 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 eth_api.go:169 eth_api.go:149 eth_api.go:246 eth_block.go:359 eth_block.go:194 value.go:584 value.go:368 service.go:222 handler.go:494 handler.go:444 handler.go:392 handler.go:223 handler.go:316 asm_arm64.s:1172]
Sep 28 16:48:11 ethereumonarm-9a6eeddd5 erigon[283293]: [EROR] [09-28|16:48:11.978] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, snapshot: 4000000-4500000, trace: [block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172], trace: [stageloop.go:135 panic.go:890 block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 28 16:48:11 ethereumonarm-9a6eeddd5 erigon[283293]: [WARN] [09-28|16:48:11.979] Served                                   conn=192.168.7.233:38730 method=eth_getBlockByNumber reqid=1 t=8.794201ms err="method handler crashed"
Sep 28 16:48:12 ethereumonarm-9a6eeddd5 erigon[283293]: [INFO] [09-28|16:48:12.538] [7/16 Execution] Blocks execution        from=4350455 to=14999999
Sep 28 16:48:12 ethereumonarm-9a6eeddd5 erigon[283293]: [EROR] [09-28|16:48:12.541] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, snapshot: 4000000-4500000, trace: [block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172], trace: [stageloop.go:135 panic.go:890 block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"

Update: I see this with --batchSize 32M as well. Happens right after state gets committed

more logs

Sep 28 17:03:41 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:03:41.191] [txpool] stat                            block=14999999 pending=4051 baseFee=0 queued=8453 alloc=1.4GB sys=2.3GB
Sep 28 17:03:59 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:03:59.411] [7/16 Execution] Executed blocks         number=4371527 blk/s=39.2 tx/s=3612.9 Mgas/s=211.6 gasState=0.75 batch=98.2MB alloc=1.1GB sys=2.3GB
Sep 28 17:04:19 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:04:19.417] [7/16 Execution] Executed blocks         number=4372520 blk/s=49.6 tx/s=4454.6 Mgas/s=235.2 gasState=0.81 batch=107.3MB alloc=1.5GB sys=2.3GB
Sep 28 17:04:39 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:04:39.419] [7/16 Execution] Executed blocks         number=4373742 blk/s=61.1 tx/s=4636.8 Mgas/s=232.5 gasState=0.88 batch=115.3MB alloc=1.2GB sys=2.3GB
Sep 28 17:04:41 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:04:41.185] [txpool] stat                            block=14999999 pending=4051 baseFee=0 queued=8483 alloc=1.3GB sys=2.3GB
Sep 28 17:04:59 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:04:59.422] [7/16 Execution] Executed blocks         number=4374853 blk/s=55.5 tx/s=4687.8 Mgas/s=226.1 gasState=0.95 batch=122.2MB alloc=943.8MB sys=2.3GB
Sep 28 17:05:15 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:05:15.155] Committed State                          gas reached=68723182638 gasTarget=68719476736
Sep 28 17:05:41 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:05:41.178] [p2p] GoodPeers                          eth66=26
Sep 28 17:05:41 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:05:41.258] [txpool] stat                            block=14999999 pending=4049 baseFee=0 queued=9536 alloc=1.4GB sys=2.4GB
Sep 28 17:05:46 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:05:46.906] [7/16 Execution] Executed blocks         number=4375975 blk/s=23.6 tx/s=1551.5 Mgas/s=74.8 gasState=0.00 batch=0B alloc=1.5GB sys=2.4GB
Sep 28 17:05:59 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:05:59.411] [7/16 Execution] Executed blocks         number=4376697 blk/s=57.7 tx/s=4807.7 Mgas/s=222.4 gasState=0.04 batch=4.4MB alloc=1.1GB sys=2.4GB
Sep 28 17:06:19 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:19.408] [7/16 Execution] Executed blocks         number=4377697 blk/s=50.0 tx/s=4859.9 Mgas/s=221.2 gasState=0.10 batch=11.9MB alloc=1.1GB sys=2.4GB
Sep 28 17:06:39 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:39.422] [7/16 Execution] Executed blocks         number=4378669 blk/s=48.6 tx/s=4728.4 Mgas/s=203.5 gasState=0.16 batch=19.4MB alloc=1.1GB sys=2.4GB
Sep 28 17:06:41 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:41.185] [txpool] stat                            block=14999999 pending=4049 baseFee=0 queued=9617 alloc=1.2GB sys=2.4GB
Sep 28 17:06:48 ethereumonarm-9a6eeddd5 erigon[284629]: [EROR] [09-28|17:06:48.590] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, trace: [stageloop.go:135 panic.go:890 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:544 block_reader.go:510 block_reader.go:320 block_snapshots.go:275 block_snapshots.go:733 block_reader.go:319 stage_execute.go:148 evm.go:103 instructions.go:483 interpreter.go:326 evm.go:67 evm.go:236 instructions.go:723 interpreter.go:326 evm.go:67 evm.go:236 state_transition.go:418 state_transition.go:232 state_processor.go:48 state_processor.go:105 blockchain.go:274 stage_execute.go:172 stage_execute.go:477 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 28 17:06:49 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:49.148] [7/16 Execution] Blocks execution        from=4375975 to=14999999
Sep 28 17:06:49 ethereumonarm-9a6eeddd5 erigon[284629]: [EROR] [09-28|17:06:49.176] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, trace: [stageloop.go:135 panic.go:890 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:510 block_reader.go:432 block_snapshots.go:275 block_snapshots.go:733 block_reader.go:431 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 28 17:06:49 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:49.677] [7/16 Execution] Blocks execution        from=4375975 to=14999999
Sep 28 17:06:49 ethereumonarm-9a6eeddd5 erigon[284629]: [EROR] [09-28|17:06:49.706] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, trace: [stageloop.go:135 panic.go:890 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:510 block_reader.go:432 block_snapshots.go:275 block_snapshots.go:733 block_reader.go:431 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 28 17:06:50 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:50.207] [7/16 Execution] Blocks execution        from=4375975 to=14999999
Sep 28 17:06:50 ethereumonarm-9a6eeddd5 erigon[284629]: [EROR] [09-28|17:06:50.236] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, trace: [stageloop.go:135 panic.go:890 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:510 block_reader.go:432 block_snapshots.go:275 block_snapshots.go:733 block_reader.go:431 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 28 17:06:50 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:50.737] [7/16 Execution] Blocks execution        from=4375975 to=14999999
Sep 28 17:06:50 ethereumonarm-9a6eeddd5 erigon[284629]: [EROR] [09-28|17:06:50.765] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, trace: [stageloop.go:135 panic.go:890 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:510 block_reader.go:432 block_snapshots.go:275 block_snapshots.go:733 block_reader.go:431 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 28 17:06:51 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:51.266] [7/16 Execution] Blocks execution        from=4375975 to=14999999
Sep 28 17:06:51 ethereumonarm-9a6eeddd5 erigon[284629]: [EROR] [09-28|17:06:51.296] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, trace: [stageloop.go:135 panic.go:890 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:510 block_reader.go:432 block_snapshots.go:275 block_snapshots.go:733 block_reader.go:431 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 28 17:06:51 ethereumonarm-9a6eeddd5 erigon[284629]: [INFO] [09-28|17:06:51.796] [7/16 Execution] Blocks execution        from=4375975 to=14999999

Update:

perhaps this is part of the issue: https://github.com/ledgerwatch/erigon-lib/blob/ec49625cd9b5e230aed091fd1da4330cd9c60ba4/compress/decompress.go#L431-L432

cw := table.condensedTableSearch(code)
l = cw.len

table.condensedTableSearch(...) can return nil, then the next line immediately dereferences it

okjustgo avatar Sep 28 '22 16:09 okjustgo

try add --downloader.verify flag

AskAlexSharov avatar Sep 29 '22 01:09 AskAlexSharov

try add --downloader.verify flag

adding --downloader.verify did not resolve the problem, still throws an exception eventually. It doesn't look like it's a particular bad block, it always gets a bit of distance before it throws, usually after it saves the state.

Yep, it made it awhile this time before crashing, but eventually died

Sep 29 17:24:42 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:24:42.091] [p2p] GoodPeers                          eth66=87
Sep 29 17:24:42 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:24:42.135] [txpool] stat                            block=14999999 pending=9993 baseFee=0 queued=22537 alloc=1.6GB sys=4.0GB
Sep 29 17:24:59 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:24:59.738] [] ETL [2/2] Loading                     into=PlainState current key=80fb784b... alloc=1.9GB sys=4.0GB
Sep 29 17:25:29 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:25:29.739] [] ETL [2/2] Loading                     into=PlainState current key=8d12a197... alloc=1.5GB sys=4.0GB
Sep 29 17:25:42 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:25:42.120] [txpool] stat                            block=14999999 pending=9993 baseFee=0 queued=22551 alloc=1.8GB sys=4.0GB
Sep 29 17:25:59 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:25:59.907] [] ETL [2/2] Loading                     into=PlainState current key=b1690c08... alloc=2.2GB sys=4.0GB
Sep 29 17:26:29 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:26:29.848] [] ETL [2/2] Loading                     into=PlainState current key=d1c4c294... alloc=2.9GB sys=4.0GB
Sep 29 17:26:42 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:26:42.091] [p2p] GoodPeers                          eth66=88
Sep 29 17:26:42 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:26:42.148] [txpool] stat                            block=14999999 pending=9993 baseFee=0 queued=22570 alloc=1.6GB sys=4.0GB
Sep 29 17:27:00 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:27:00.056] [] ETL [2/2] Loading                     into=PlainState current key=f230b790... alloc=2.0GB sys=4.0GB
Sep 29 17:27:05 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:27:05.098] Write to db                              progress=2.8M/4.9M current table=Code
Sep 29 17:27:14 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:27:14.828] Write to db                              progress=3.0M/4.9M current table=IncarnationMap
Sep 29 17:27:18 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:27:18.686] [7/16 Execution] Executed blocks         number=4976559 blk/s=1.0 tx/s=159.5 Mgas/s=6.7 gasState=0.00 batch=0B alloc=1.6GB sys=4.0GB
Sep 29 17:27:32 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:27:32.566] [7/16 Execution] Executed blocks         number=4976667 blk/s=7.8 tx/s=1326.3 Mgas/s=53.9 gasState=0.01 batch=1.3MB alloc=1.2GB sys=4.0GB
Sep 29 17:27:42 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:27:42.118] [txpool] stat                            block=14999999 pending=9993 baseFee=0 queued=22617 alloc=819.3MB sys=4.0GB
Sep 29 17:27:52 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:27:52.614] [7/16 Execution] Executed blocks         number=4976886 blk/s=10.9 tx/s=1908.8 Mgas/s=74.6 gasState=0.02 batch=4.1MB alloc=1.2GB sys=4.0GB
Sep 29 17:28:11 ethereumonarm-9a6eeddd5 erigon[315830]: [EROR] [09-29|17:28:11.981] RPC method eth_getBlockByNumber crashed: runtime error: invalid memory address or nil pointer dereference, snapshot: 4500000-5000000, trace: [block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 eth_api.go:169 eth_api.go:149 eth_api.go:246 eth_block.go:359 eth_block.go:194 value.go:584 value.go:368 service.go:222 handler.go:494 handler.go:444 handler.go:392 handler.go:223 handler.go:316 asm_arm64.s:1172]
Sep 29 17:28:11 ethereumonarm-9a6eeddd5 erigon[315830]: [service.go:217 panic.go:884 block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 eth_api.go:169 eth_api.go:149 eth_api.go:246 eth_block.go:359 eth_block.go:194 value.go:584 value.go:368 service.go:222 handler.go:494 handler.go:444 handler.go:392 handler.go:223 handler.go:316 asm_arm64.s:1172]
Sep 29 17:28:11 ethereumonarm-9a6eeddd5 erigon[315830]: [EROR] [09-29|17:28:11.981] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, snapshot: 4500000-5000000, trace: [block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:544 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172], trace: [stageloop.go:135 panic.go:890 block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:544 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 29 17:28:11 ethereumonarm-9a6eeddd5 erigon[315830]: [WARN] [09-29|17:28:11.984] Served                                   conn=192.168.7.233:60178 method=eth_getBlockByNumber reqid=1 t=15.863296ms err="method handler crashed"
Sep 29 17:28:12 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:28:12.484] [7/16 Execution] Blocks execution        from=4976559 to=14999999
Sep 29 17:28:12 ethereumonarm-9a6eeddd5 erigon[315830]: [EROR] [09-29|17:28:12.487] Staged Sync                              err="runtime error: invalid memory address or nil pointer dereference, snapshot: 4500000-5000000, trace: [block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172], trace: [stageloop.go:135 panic.go:890 block_reader.go:609 panic.go:884 panic.go:260 signal_unix.go:835 decompress.go:432 decompress.go:524 block_reader.go:634 block_reader.go:467 block_snapshots.go:319 block_snapshots.go:745 block_reader.go:466 stage_execute.go:462 default_stages.go:101 sync.go:352 sync.go:255 stageloop.go:169 stageloop.go:90 asm_arm64.s:1172]"
Sep 29 17:28:12 ethereumonarm-9a6eeddd5 erigon[315830]: [INFO] [09-29|17:28:12.988] [7/16 Execution] Blocks execution        from=4976559 to=14999999

okjustgo avatar Sep 29 '22 15:09 okjustgo

Update here, I finally got through the 7/16 Execution phase, it took 3 weeks 😁 with --batchsize 24M and --txpool.disable.

It looks like there is another possible leak in the 13/16 LogIndex phase. Overall memory consumption hit 12GB or so (on an 8GB RasPI 4B) I assume the heavy use of swap here will kill performance. I've restarted with --pprof so I can capture a memory profile. In the logs below we can see the 10GB of sys usage 🙁

Logs:

Oct 23 18:03:13 ethereumonarm-9a6eeddd5 erigon[987195]: [INFO] [10-23|18:03:13.415] [13/16 LogIndex] ETL [2/2] Loading       into=LogTopicIndex current key=00000000000000000000000003251ae9065a02d50230e02b137265157e101c2000da3d3a alloc=6.9GB sys=10.5GB
Oct 23 18:03:43 ethereumonarm-9a6eeddd5 erigon[987195]: [INFO] [10-23|18:03:43.340] [13/16 LogIndex] ETL [2/2] Loading       into=LogTopicIndex current key=000000000000000000000000039ade827d952ba6d99d2ee75b77ee0b4d189bf9ffffffff alloc=4.4GB sys=10.5GB
Oct 23 18:04:13 ethereumonarm-9a6eeddd5 erigon[987195]: [INFO] [10-23|18:04:13.347] [13/16 LogIndex] ETL [2/2] Loading       into=LogTopicIndex current key=00000000000000000000000003e14fc8b4e81b9297f071b0b4747959d6d0de23ffffffff alloc=5.0GB sys=10.5GB
Oct 23 18:04:44 ethereumonarm-9a6eeddd5 erigon[987195]: [INFO] [10-23|18:04:44.481] [13/16 LogIndex] ETL [2/2] Loading       into=LogTopicIndex current key=00000000000000000000000003f7724180aa6b939894b5ca4314783b0b36b32900cd3080 alloc=6.5GB sys=10.5GB
Oct 23 18:05:08 ethereumonarm-9a6eeddd5 erigon[987195]: [INFO] [10-23|18:05:08.662] [p2p] GoodPeers                          eth66=90
Oct 23 18:05:13 ethereumonarm-9a6eeddd5 erigon[987195]: [INFO] [10-23|18:05:13.628] [13/16 LogIndex] ETL [2/2] Loading       into=LogTopicIndex current key=00000000000000000000000003f7724180aa6b939894b5ca4314783b0b36b32900ce54da alloc=6.6GB sys=10.5GB
Oct 23 18:05:43 ethereumonarm-9a6eeddd5 erigon[987195]: [INFO] [10-23|18:05:43.349] [13/16 LogIndex] ETL [2/2] Loading       into=LogTopicIndex current key=000000000000000000000000048d2c3c467e33a57c420dc92cda7012459270bfffffffff alloc=5.6GB sys=10.5GB

okjustgo avatar Oct 23 '22 18:10 okjustgo

Here's the pprof output for LogIndex phase:

CommandLine:

--chain mainnet --batchSize 24M --txpool.disable --pprof --blockDownloaderWindow 32768 --prune=htc --prune.r.before=11052984 --authrpc.jwtsecret=xxx --datadir /home/ethereum/.erigon --private.api.addr= --authrpc.addr 0.0.0.0 --authrpc.vhosts=xxx

Logs:

Oct 23 23:10:55 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:10:55.773] [13/16 LogIndex] Progress                number=13333575 alloc=3.4GB sys=7.5GB
Oct 23 23:11:25 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:11:25.775] [13/16 LogIndex] Progress                number=13337605 alloc=4.1GB sys=7.5GB
Oct 23 23:11:55 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:11:55.771] [13/16 LogIndex] Progress                number=13341522 alloc=4.7GB sys=7.5GB
Oct 23 23:12:25 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:12:25.771] [13/16 LogIndex] Progress                number=13345609 alloc=5.4GB sys=7.5GB
Oct 23 23:12:33 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:12:33.065] [p2p] GoodPeers                          eth66=100
Oct 23 23:12:58 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:12:58.266] [13/16 LogIndex] Progress                number=13348532 alloc=3.3GB sys=7.5GB
Oct 23 23:13:36 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:13:36.247] Flushed buffer file                      name=/home/ethereum/.erigon/etl-temp/erigon-sortable-buf-406585851 alloc=4.0GB sys=7.5GB
Oct 23 23:13:37 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:13:37.834] [13/16 LogIndex] Progress                number=13350923 alloc=4.0GB sys=7.5GB
Oct 23 23:13:56 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:13:56.565] [13/16 LogIndex] Progress                number=13353535 alloc=4.5GB sys=7.5GB
Oct 23 23:14:25 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:14:25.772] [13/16 LogIndex] Progress                number=13358084 alloc=5.3GB sys=7.5GB
Oct 23 23:14:33 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:14:33.065] [p2p] GoodPeers                          eth66=100
Oct 23 23:14:56 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:14:56.903] [13/16 LogIndex] Progress                number=13362308 alloc=6.0GB sys=7.5GB
Oct 23 23:15:25 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:15:25.772] [13/16 LogIndex] Progress                number=13365721 alloc=3.6GB sys=7.5GB
Oct 23 23:15:55 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:15:55.772] [13/16 LogIndex] Progress                number=13369719 alloc=4.3GB sys=7.5GB
Oct 23 23:16:27 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:16:27.488] [13/16 LogIndex] Progress                number=13373575 alloc=5.0GB sys=7.5GB
Oct 23 23:16:33 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:16:33.065] [p2p] GoodPeers                          eth66=100
Oct 23 23:16:57 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:16:57.600] [13/16 LogIndex] Progress                number=13377713 alloc=5.7GB sys=7.5GB
Oct 23 23:17:25 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:17:25.771] [13/16 LogIndex] Progress                number=13380617 alloc=3.5GB sys=7.5GB
Oct 23 23:17:55 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:17:55.773] [13/16 LogIndex] Progress                number=13383691 alloc=4.3GB sys=7.5GB
Oct 23 23:18:25 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:18:25.774] [13/16 LogIndex] Progress                number=13387795 alloc=5.1GB sys=7.5GB
Oct 23 23:18:33 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:18:33.064] [p2p] GoodPeers                          eth66=100
Oct 23 23:18:55 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:18:55.772] [13/16 LogIndex] Progress                number=13391918 alloc=5.9GB sys=7.5GB
Oct 23 23:19:25 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:19:25.773] [13/16 LogIndex] Progress                number=13395019 alloc=6.6GB sys=7.5GB
Oct 23 23:19:55 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:19:55.773] [13/16 LogIndex] Progress                number=13398816 alloc=4.0GB sys=7.5GB
Oct 23 23:20:25 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:20:25.772] [13/16 LogIndex] Progress                number=13402443 alloc=4.8GB sys=7.5GB
Oct 23 23:20:33 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:20:33.064] [p2p] GoodPeers                          eth66=100
Oct 23 23:20:55 ethereumonarm-9a6eeddd5 erigon[1114328]: [INFO] [10-23|23:20:55.772] [13/16 LogIndex] Progress                number=13406000 alloc=5.5GB sys=7.5GB

pprof: mem-24M-LogIndex

okjustgo avatar Oct 23 '22 23:10 okjustgo

Don't see anything bad in log-index phase.

  • left one: is known issue - we will fix it in future
  • right one: is just work of the stage - 600Mb is expected (we can reduce if need).

AskAlexSharov avatar Oct 24 '22 03:10 AskAlexSharov

This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] avatar Dec 04 '22 02:12 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Dec 11 '22 02:12 github-actions[bot]