bsc icon indicating copy to clipboard operation
bsc copied to clipboard

Out-Of-Memory

Open pepsi1k opened this issue 3 years ago • 14 comments

Hello everyone! I'm new to a deep understanding of blockchain node, but I need to deploy a bsc node. The problem is that it fills up all the RAM and restarts every 8 hours. After 7 days, the node synchronize the latest block, but still looking at the graph we can say the node will soon fall

Can you help me figure out the problem, maybe I'm missing some configuration

oom

I use aws instance: c5.4xlarge Disk: 2Tb, 6k IOPS, gp3, nvme

run geth with this options:

geth \
  --config config.toml \
  --datadir /data/geth \
  --port 30311 --cache 18000 \
  --syncmode full \
  --http --http.addr 0.0.0.0 --http.port 8545 \
  --http.vhosts=* --http.api personal,web3,eth \
  --ws --ws.addr 0.0.0.0 --ws.port 8546 \
  --ws.origins=* --ws.api personal,web3,eth \
  --allow-insecure-unlock \
  --txlookuplimit 0 \
  --nousb

config.toml:

[Eth]
NetworkId = 56
NoPruning = false
NoPrefetch = false
LightPeers = 100
UltraLightFraction = 75
DatabaseFreezer = ""
TrieTimeout = 2500000000000
TrieCleanCache = 256
TrieDirtyCache = 256
EnablePreimageRecording = false
EWASMInterpreter = ""
EVMInterpreter = ""


[Eth.TxPool]
Locals = []
NoLocals = false
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 1000000000
PriceBump = 10
AccountSlots = 512
GlobalSlots = 10000
AccountQueue = 256
GlobalQueue = 5000
Lifetime = 10800000000000

[Eth.GPO]
Blocks = 20
Percentile = 60
OracleThreshold = 1000

[Node]
HTTPHost = "0.0.0.0"
NoUSB = true
InsecureUnlockAllowed = false
IPCPath = "geth.ipc"
HTTPPort = 8545
HTTPVirtualHosts = ["*"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]

[Node.P2P]
MaxPeers = 300
NoDiscovery = false
BootstrapNodes = ...
StaticNodes = ...
ListenAddr = ":30311"
EnableMsgEvents = false

[Node.HTTPTimeouts]
ReadTimeout = 30000000000
WriteTimeout = 30000000000
IdleTimeout = 120000000000

pepsi1k avatar Sep 13 '21 09:09 pepsi1k

You can try to sliglty decrease the --cache value and exepriment with that.

perfectcircle2020 avatar Sep 13 '21 19:09 perfectcircle2020

In some previous geth builds <1.1.2 I saw that im passing --cache 10000 but somewhy it uses 20000 MB ram In geth 1.1.2 --cache working as expected. 10 000 = 10 000

0fuz avatar Sep 16 '21 08:09 0fuz

My node was restarted 1 time after synchronization, now it has been working stable for 6 days. I will reduce the --cache and give you report if it falls again

pepsi1k avatar Sep 16 '21 09:09 pepsi1k

I can't say that I like how my bsc node worked, but the periodic shutdown once every 7 days suited me. After update 1.1.2 -> 1.1.3, it crashes once every half a day.

image

I've been reducing --cache from 18000 -> 15000 - > 12000, nothing helped. I seriously can't figure out why this node use so match RAM. @0fuz, can you show your bsc config?

pepsi1k avatar Oct 22 '21 11:10 pepsi1k

my config similar to this https://github.com/binance-chain/bsc/releases/tag/v1.1.2 just --cache = 80% or RAM, peers = 90, lightpeers = 0

your geth do restarts with some interval, have you read the reason of it inside ./node/geth.log or near this(it could help prepare bug report if its real bug)? Seems your geth getting not enough RAM somewhy and restarts.

0fuz avatar Oct 22 '21 14:10 0fuz

Now it crash again

bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:05.118] Deep froze chain segment                 blocks=17  elapsed=50.164ms    number=12,047,106 hash=524a36..28f19e
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:06.334] Imported new chain segment               blocks=1   txs=669  mgas=88.235  elapsed=429.843ms   mgasps=205.272  number=12,137,106 hash=024888..e32eee dirty=1.37GiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:08.054] Imported new chain segment               blocks=1   txs=654  mgas=88.385  elapsed=1.517s      mgasps=58.247   number=12,137,107 hash=d4f07c..f2f772 dirty=1.38GiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:09.697] Imported new chain segment               blocks=1   txs=611  mgas=88.041  elapsed=1.373s      mgasps=64.088   number=12,137,108 hash=834083..d23b9b dirty=1.38GiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:14.794] Downloader queue stats                   receiptTasks=0 blockTasks=0    itemSize=165.96KiB throttle=395
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:30.708] Persisted trie from memory database      nodes=1,215,032 size=345.77MiB time=16.564326552s gcnodes=848,324   gcsize=325.46MiB gctime=4.28344597s  livenodes=2,166,331 livesize=571.66MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:30.712] Imported new chain segment               blocks=1   txs=515  mgas=78.167  elapsed=19.395s     mgasps=4.030    number=12,137,109 hash=407d36..f0f675 dirty=1.38GiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:33.355] Imported new chain segment               blocks=1   txs=476  mgas=75.386  elapsed=2.635s      mgasps=28.608   number=12,137,110 hash=d91153..267690 dirty=960.15MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:37.679] do light process success at block        num=12,137,113
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:40.252] Imported new chain segment               blocks=4   txs=2790 mgas=305.439 elapsed=6.896s      mgasps=44.289   number=12,137,114 hash=f12395..b58381 dirty=982.12MiB  ignored=2
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:42.597] Imported new chain segment               blocks=1   txs=603  mgas=88.373  elapsed=2.264s      mgasps=39.025   number=12,137,115 hash=e33555..01b612 dirty=985.63MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:43.106] Re-queue blocks                          number=12,137,116 hash=0x59b35b44db91cc87ce461480e3f9834c43d4d67d2d027a150f6dc1e80388385a
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:43.137] Chain reorg detected                     number=12,137,114 hash=f12395..b58381 drop=1 dropfrom=e33555..01b612 add=1 addfrom=0a487d..fcaa33
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:06:43.255] Imported new chain segment               blocks=1   txs=604  mgas=88.355  elapsed=648.841ms   mgasps=136.173  number=12,137,115 hash=0a487d..fcaa33 dirty=980.05MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:24.338] Deep froze chain segment                 blocks=9   elapsed=19.023s     number=12,047,115 hash=607e42..b07d9b
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:33.972] Imported new chain segment               blocks=1   txs=535  mgas=88.532  elapsed=50.708s     mgasps=1.746    number=12,137,116 hash=59b35b..88385a dirty=988.05MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:44.707] Imported new chain segment               blocks=1   txs=595  mgas=88.678  elapsed=10.734s     mgasps=8.261    number=12,137,117 hash=3c306e..55ab9d age=1m6s     dirty=991.77MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:44.790] Downloader queue stats                   receiptTasks=0 blockTasks=0    itemSize=159.61KiB throttle=411
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:46.355] Imported new chain segment               blocks=1   txs=546  mgas=88.630  elapsed=1.571s      mgasps=56.384   number=12,137,116 hash=527fcb..dd5793 age=1m5s     dirty=991.21MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:46.454] Imported new chain segment               blocks=1   txs=1    mgas=0.100   elapsed=80.050ms    mgasps=1.253    number=12,137,116 hash=fbed2c..2edcf7 age=1m2s     dirty=991.22MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:48.472] Got interrupt, shutting down... 
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:48.755] HTTP server stopped                      endpoint=[::]:8545
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:48.760] HTTP server stopped                      endpoint=[::]:8546
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:07:48.767] IPC endpoint closed                      url=/data/geth/geth.ipc
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:08:02.311] Imported new chain segment               blocks=1   txs=387  mgas=67.446  elapsed=15.856s     mgasps=4.254    number=12,137,118 hash=73eac5..1ac43f age=1m24s    dirty=997.96MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:08:05.566] Ethereum protocol stopped 
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:08:05.613] Snapshot extension registration failed   peer=1d4fc8e3 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:08:05.722] Transaction pool stopped 
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:08:23.094] Writing cached state to disk             block=12,137,118 hash=73eac5..1ac43f root=508b0f..d830e7
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:08:27.684] Snapshot extension registration failed   peer=98db1968 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:08:30.897] Deep froze chain segment                 blocks=3   elapsed=936.965ms   number=12,047,118 hash=530120..eb1fa6
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:08:59.479] Snapshot extension registration failed   peer=4512109c err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:09:05.902] Snapshot extension registration failed   peer=7a78114f err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:09:08.342] Snapshot extension registration failed   peer=35ba3061 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:09:17.204] Snapshot extension registration failed   peer=17102fd1 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:09:27.405] Snapshot extension registration failed   peer=ee4f6104 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:09:35.117] Snapshot extension registration failed   peer=cf550c17 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:10:00.520] Snapshot extension registration failed   peer=c3c06d02 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:10:05.639] Snapshot extension registration failed   peer=101502b5 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:10:07.016] Snapshot extension registration failed   peer=e3b24acd err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:10:31.989] Snapshot extension registration failed   peer=25567f7e err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:10:34.009] Snapshot extension registration failed   peer=5081959f err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:10:59.195] Snapshot extension registration failed   peer=6de588ae err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:11:08.044] Snapshot extension registration failed   peer=a02a1c1d err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:11:10.812] Snapshot extension registration failed   peer=1f1f9778 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:11:16.423] Snapshot extension registration failed   peer=f054a33f err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:11:22.224] Snapshot extension registration failed   peer=35ba3061 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:11:27.691] Snapshot extension registration failed   peer=a6c868a1 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:11:28.467] Snapshot extension registration failed   peer=1c9b2c4b err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:11:47.823] Snapshot extension registration failed   peer=cbd3f861 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:12:02.457] Snapshot extension registration failed   peer=3e999961 err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:12:03.487] Snapshot extension registration failed   peer=439c4bfd err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:12:06.622] Snapshot extension registration failed   peer=1fbfbdbb err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:12:08.525] Snapshot extension registration failed   peer=207594af err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:12:10.650] Snapshot extension registration failed   peer=89027a8b err="peer connected on snap without compatible eth support"
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:14.055] Persisted trie from memory database      nodes=848,738   size=244.45MiB time=3m50.911150301s gcnodes=54322     gcsize=21.37MiB  gctime=5.001032234s livenodes=1,419,487 livesize=359.22MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:14.064] Writing cached state to disk             block=12,137,117 hash=3c306e..55ab9d root=fdd3cd..3c7429
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:14.271] Persisted trie from memory database      nodes=10555     size=4.12MiB   time=206.022573ms    gcnodes=0         gcsize=0.00B     gctime=0s           livenodes=1,408,932 livesize=355.10MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:14.275] Writing cached state to disk             block=12,136,991 hash=4eba47..ce55c0 root=cae978..55253c
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:15.274] Persisted trie from memory database      nodes=58400     size=22.04MiB  time=998.635994ms    gcnodes=0         gcsize=0.00B     gctime=0s           livenodes=1,350,532 livesize=333.06MiB
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:15.274] Writing snapshot state to disk           root=d5bba8..11bf69
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:15.274] Persisted trie from memory database      nodes=0         size=0.00B     time="3.59µs"        gcnodes=0         gcsize=0.00B     gctime=0s           livenodes=1,350,532 livesize=333.06MiB
bnb_geth.1.3t0fseheuzv8@    | ERROR[10-27|14:12:15.275] Dangling trie nodes after full cleanup 
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:15.278] Writing clean trie cache to disk         path=/data/geth/geth/triecache threads=16
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:22.406] Persisted the clean trie cache           path=/data/geth/geth/triecache elapsed=7.128s
bnb_geth.1.3t0fseheuzv8@    | INFO [10-27|14:12:22.408] Blockchain stopped

pepsi1k avatar Oct 27 '21 14:10 pepsi1k

Got interrupt, shutting down... can we written when I press ctrl+C for geth. Your logs has too. When I tried to run geth and keep it working till im connected to ssh, I got disconnects and geth sometimes stop working.

you can try start geth via daemon service or nohup ./geth ....... &; disown %1 // %1 means job id from jobs

0fuz avatar Oct 27 '21 15:10 0fuz

I don't run geth using ssh connection, I wrapped it in docker container and run it via docker-swarm. We see the log Got interrupt, shutting down due to Out of Memory I configured limits.memory: 30Gb, total 31Gb and docker sends it a SIGTERM signal if container bnb goes beyond this limit.

pepsi1k avatar Oct 28 '21 10:10 pepsi1k

Usually, the suggested setting is 1/3 of the size of system RAM

unclezoro avatar Nov 30 '21 06:11 unclezoro

image

I'm already tired of this node and I want to forget it like a bad dream.

I have tried all the --cache configurations starting from [5000, 6000...20000]. My conclusion is no matter how much cache is allocated, it will still eat as much as it wants.

11/24 - 11/25 - For some reason geth-data was using 4T of disk space, I couldn't clear it with snapshot prune, issue #556. I decided to delete the current geth-data and download a snapshot. 11/25 - After downloading a new snapshot, I was surprised how fast it syncs, but after ~4h you can see OOM again, after which the sync was no longer so fast 11/26 - 11/29 - On this day I added --rpc.allow-unprotected-txs. You can see that node was no longer falling as usual, but it was very slow to synchronize and in 3 days it could not synchronize 30000 blocks 11/29 - 11/30 - Update bsc 1.1.5 -> 1.1.6, add --txpool.reannouncetime 5m option. Blockchain still behind 30000 blocks

pepsi1k avatar Nov 30 '21 11:11 pepsi1k

image

I'm already tired of this node and I want to forget it like a bad dream.

I have tried all the --cache configurations starting from [5000, 6000...20000]. My conclusion is no matter how much cache is allocated, it will still eat as much as it wants.

11/24 - 11/25 - For some reason geth-data was using 4T of disk space, I couldn't clear it with snapshot prune, issue #556. I decided to delete the current geth-data and download a snapshot. 11/25 - After downloading a new snapshot, I was surprised how fast it syncs, but after ~4h you can see OOM again, after which the sync was no longer so fast 11/26 - 11/29 - On this day I added --rpc.allow-unprotected-txs. You can see that node was no longer falling as usual, but it was very slow to synchronize and in 3 days it could not synchronize 30000 blocks 11/29 - 11/30 - Update bsc 1.1.5 -> 1.1.6, add --txpool.reannouncetime 5m option. Blockchain still behind 30000 blocks

Because there are 4T data in your disk, the storage performance will be degraded, you should do prune first, otherwise you can't catch up the new blocks anymore.

keefel avatar Dec 17 '21 04:12 keefel

@KeefeL Yes, I've already taken a snapshot of prune. Node was able to synchronize 30,000 blocks in a week, but it falls again due to OOM image

Now my bsc node is synced but fails every 5-7 days

pepsi1k avatar Dec 17 '21 10:12 pepsi1k

We have responded to the question and will proceed to close the case as we didn't get any additional question after 3days. Please proceed to join our Discord channel for more discussion at https://discord.com/invite/binancesmartchain

RumeelHussainbnb avatar Dec 21 '21 11:12 RumeelHussainbnb

@RumeelHussainbnb The question is still open. It is unclear why the node is falling due to OOM? I have already submitted this question for discussion in discord and have not received an answer.

pepsi1k avatar Dec 21 '21 12:12 pepsi1k