bsc
bsc copied to clipboard
Repeatative OOM since 24 hours from now
System information
Geth version: 1.1.8 OS & Version: Linux
Expected behaviour
No OOM
Actual behaviour
Steps to reproduce the behaviour
This occurs on all of my 4 nodes and from what I hear by other operators too. The node simply runs out of memory minutes after start, crashes, restarts, catches up sync, crashes again etc... Something changed yesterday (see graph above)
Logs
Logs are clear, it just dies and restarts (due to my docker setup) - restart at 04:33:38
:
INFO [03-18|04:33:05.935] Deep froze chain segment blocks=18 elapsed=24.189ms number=16,067,141 hash=68fc5f..c6e2da
INFO [03-18|04:33:12.882] Imported new chain segment blocks=1 txs=211 mgas=25.530 elapsed=6.698s mgasps=3.811 number=16,157,142 hash=7420e9..bde1d3 dirty=368.59MiB
INFO [03-18|04:33:14.384] Imported new chain segment blocks=1 txs=193 mgas=19.015 elapsed=1.501s mgasps=12.665 number=16,157,143 hash=45e41f..7b01ae dirty=369.45MiB
INFO [03-18|04:33:15.461] do light process success at block num=16,157,144
INFO [03-18|04:33:15.524] Imported new chain segment blocks=1 txs=175 mgas=25.484 elapsed=1.083s mgasps=23.517 number=16,157,144 hash=0b1946..f38018 dirty=371.60MiB
INFO [03-18|04:33:15.653] do light process success at block num=16,157,144
INFO [03-18|04:33:15.702] Chain reorg detected number=16,157,143 hash=45e41f..7b01ae drop=1 dropfrom=0b1946..f38018 add=1 addfrom=b62420..dbe214
INFO [03-18|04:33:15.704] Imported new chain segment blocks=1 txs=192 mgas=28.281 elapsed=179.876ms mgasps=157.227 number=16,157,144 hash=b62420..dbe214 dirty=371.26MiB
INFO [03-18|04:33:16.030] Re-queue blocks number=16,157,145 hash=0xb052b61c25a0bafff69a222a2d49e010c9c30d3b3c71edac04de63dd1a92743a
INFO [03-18|04:33:19.147] do light process success at block num=16,157,145
INFO [03-18|04:33:19.291] Imported new chain segment blocks=1 txs=119 mgas=14.642 elapsed=499.742ms mgasps=29.299 number=16,157,145 hash=b052b6..92743a dirty=373.30MiB
INFO [03-18|04:33:20.066] do light process success at block num=16,157,146
INFO [03-18|04:33:20.336] Imported new chain segment blocks=1 txs=156 mgas=21.399 elapsed=982.937ms mgasps=21.771 number=16,157,146 hash=b2f10e..39ae59 dirty=375.01MiB
INFO [03-18|04:33:22.736] Imported new chain segment blocks=1 txs=119 mgas=14.475 elapsed=1.489s mgasps=9.720 number=16,157,147 hash=7abd60..da01fd dirty=375.90MiB
INFO [03-18|04:33:30.007] Imported new chain segment blocks=1 txs=167 mgas=25.477 elapsed=2.279s mgasps=11.178 number=16,157,148 hash=b1c1a2..fa7b07 dirty=377.75MiB
INFO [03-18|04:33:32.917] Imported new chain segment blocks=1 txs=272 mgas=37.766 elapsed=2.348s mgasps=16.078 number=16,157,149 hash=094050..bf3fc2 dirty=380.62MiB
INFO [03-18|04:33:36.320] do light process success at block num=16,157,150
INFO [03-18|04:33:36.789] Imported new chain segment blocks=1 txs=178 mgas=20.199 elapsed=2.870s mgasps=7.037 number=16,157,150 hash=86d09c..a5ccbb dirty=381.59MiB
INFO [03-18|04:33:38.745] Imported new chain segment blocks=1 txs=113 mgas=14.315 elapsed=1.910s mgasps=7.492 number=16,157,151 hash=1b581a..2afc24 dirty=382.04MiB
/entrypoint.sh: assuming arguments for bsc
/entrypoint.sh: setting data directory to /data
INFO [03-18|04:33:42.291] Starting pprof server addr=http://0.0.0.0:6060/debug/pprof
INFO [03-18|04:33:42.293] Starting Geth on Ethereum mainnet...
INFO [03-18|04:33:42.293] Enabling metrics collection
INFO [03-18|04:33:42.299] Maximum peer count ETH=350 LES=0 total=350
INFO [03-18|04:33:42.299] Smartcard socket not found, disabling err="stat /run/pcscd/pcscd.comm: no such file or directory"
WARN [03-18|04:33:42.299] Option nousb is deprecated and USB is deactivated by default. Use --usb to enable
INFO [03-18|04:33:42.300] Set global gas cap cap=25,000,000
INFO [03-18|04:33:42.301] Allocated trie memory caches clean=9.23GiB dirty=6.59GiB
INFO [03-18|04:33:42.301] Allocated cache and file handles database=/data/geth/chaindata cache=10.55GiB handles=524,288
INFO [03-18|04:33:56.541] Opened ancient database database=/data/geth/chaindata/ancient readonly=false
INFO [03-18|04:33:56.556] Initialised chain configuration config="{ChainID: 56 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: 0, Ramanujan: 0, Niels: 0, MirrorSync: 5184000, Bruno: 13082000, Berlin: <nil>, YOLO v3: <nil>, Engine: parlia}"
INFO [03-18|04:33:56.562] Deep froze chain segment blocks=10 elapsed=8.340ms number=16,067,151 hash=1686df..6a862d
INFO [03-18|04:34:07.204] Initialising Ethereum protocol network=56 dbversion=8
INFO [03-18|04:34:15.660] Loaded most recent local header number=16,157,151 hash=1b581a..2afc24 td=32,116,076 age=39s
INFO [03-18|04:34:15.660] Loaded most recent local full block number=16,157,151 hash=1b581a..2afc24 td=32,116,076 age=39s
INFO [03-18|04:34:15.660] Loaded most recent local fast block number=16,157,151 hash=1b581a..2afc24 td=32,116,076 age=39s
INFO [03-18|04:34:15.661] Loaded last fast-sync pivot marker number=6,920,578
WARN [03-18|04:34:15.663] Head state missing, repairing number=16,157,151 hash=1b581a..2afc24
INFO [03-18|04:34:15.990] Loaded most recent local header number=16,157,151 hash=1b581a..2afc24 td=32,116,076 age=39s
INFO [03-18|04:34:15.990] Loaded most recent local full block number=16,156,997 hash=93e8f7..9a8974 td=32,115,782 age=8m47s
INFO [03-18|04:34:15.990] Loaded most recent local fast block number=16,157,151 hash=1b581a..2afc24 td=32,116,076 age=39s
INFO [03-18|04:34:15.990] Loaded last fast-sync pivot marker number=6,920,578
WARN [03-18|04:34:16.036] Sanitizing invalid txpool reannounce time provided=0s updated=1m0s
WARN [03-18|04:34:16.038] Sanitizing invalid gasprice oracle price cap provided=<nil> updated=500,000,000,000
WARN [03-18|04:34:16.039] Old unclean shutdowns found count=109
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T02:54:20+0000 age=1h39m56s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T03:03:42+0000 age=1h30m34s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T03:13:46+0000 age=1h20m30s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T03:22:50+0000 age=1h11m26s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T03:32:23+0000 age=1h1m53s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T03:39:28+0000 age=54m48s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T03:48:08+0000 age=46m8s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T03:57:13+0000 age=37m3s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T04:07:42+0000 age=26m34s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T04:15:09+0000 age=19m7s
WARN [03-18|04:34:16.039] Unclean shutdown detected booted=2022-03-18T04:23:25+0000 age=10m51s
INFO [03-18|04:34:16.039] Starting peer-to-peer node instance=Geth/v1.1.8-859186f2/linux-amd64/go1.16.13
INFO [03-18|04:34:16.083] New local node record seq=242 id=ec5982912a5ab8e6 ip=127.0.0.1 udp=30311 tcp=30311
INFO [03-18|04:34:16.084] Started P2P networking self=enode://761bae040c78a72b2a0c1d6bdb242439bf5887cd6f1f0361b6e091683e6b75c464c3e3b42878a8d6d927858f9ad50262a1e056784d1bcaea64f42f5392c99efa@127.0.0.1:30311
INFO [03-18|04:34:16.086] IPC endpoint opened url=/data/geth.ipc
INFO [03-18|04:34:16.086] HTTP server started endpoint=[::]:8545 prefix= cors= vhosts=*
INFO [03-18|04:34:16.086] WebSocket enabled url=ws://[::]:8546
INFO [03-18|04:34:16.197] Block synchronisation started
INFO [03-18|04:34:16.609] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=15.88KiB throttle=4128
INFO [03-18|04:34:16.685] Importing sidechain segment start=16,156,998 end=16,157,153
INFO [03-18|04:34:19.169] New local node record seq=243 id=ec5982912a5ab8e6 ip=162.55.82.39 udp=30311 tcp=30311
INFO [03-18|04:34:25.315] Imported new chain segment blocks=11 txs=1687 mgas=223.734 elapsed=8.630s mgasps=25.925 number=16,157,008 hash=e70b6d..881569 age=8m24s dirty=27.11MiB
INFO [03-18|04:34:33.481] Imported new chain segment blocks=14 txs=2185 mgas=299.933 elapsed=8.166s mgasps=36.728 number=16,157,022 hash=713aca..ffdd46 age=7m50s dirty=63.56MiB
INFO [03-18|04:34:41.797] Imported new chain segment blocks=17 txs=2706 mgas=372.304 elapsed=8.315s mgasps=44.772 number=16,157,039 hash=7dd0b9..0f5292 age=7m7s dirty=106.64MiB
INFO [03-18|04:34:50.007] Imported new chain segment blocks=18 txs=2722 mgas=367.450 elapsed=8.210s mgasps=44.753 number=16,157,057 hash=14bc71..94e025 age=6m22s dirty=151.00MiB
INFO [03-18|04:34:58.647] Imported new chain segment blocks=17 txs=2827 mgas=390.121 elapsed=8.639s mgasps=45.155 number=16,157,074 hash=f8ea1c..61f206 age=5m35s dirty=197.57MiB
got it, we will check that
The same happens to me. I start the node indicating 30GB as maximum memory in the parameter but after several hours the geth process occupies the 126 GB of RAM and shuts down without giving any error message.
~Issue seems to be fixed~ Not fixed.
~I haven't changed anything on my side though. Would be interesting to hear some comments what has happened.~
I have pulled off any requests from the node and it worked. Now I just returned it back to get normal load (which are regular websockets requests) - they killed it in just a matter of minute being just a few of them... I will try to research further what particular request is causing it.
UPDATE: It again started to happen without any specific load:
Guys, any update on this? Seems any BSC node running current version is affected badly...
Guys, any update on this? Seems any BSC node running current version is affected badly...
We are checking that, currently you can try using V1.1.7.
Take a look at your startup command, --cache
can't be set more than 1/3 * your ram size
Take a look at your startup command,
--cache
can't be set more than 1/3 *your ram size
Here is my current startup command. --cache
is 27GB on a host with 128GB RAM... It's bare metal, nothing besides BSC node runs there. Graphs above also giving a picture of normal state of 27GB and spike of roughly 4-5x before it dies.
As I mentioned before this setup was running smooth and steady since months... Even 1.1.8 was running fine, I upgraded it in February already. It just started to OOM out of the blue on 17-th of March. I can see other external nodes dying with exactly same pattern, they keep crashing and restarting.
When the memory surges, can you take a profile for the node? You can use curl -sK -v http://127.0.0.1:6060/debug/pprof/heap > heap.profile
.
You can enable the pprof server by adding --pprof
in the command line.
I am currently trying to research some pattern why it happens. Seems to be some sort of requests. I'll update this ticket once I have more details (will collect profile too).
I am currently trying to research some pattern why it happens. Seems to be some sort of requests. I'll update this ticket once I have more details (will collect profile too).
Is there any news?
This OOM behaviour could be addressed to a portion of private code. Closing.