electrumx icon indicating copy to clipboard operation
electrumx copied to clipboard

Electrumx keeps crashing during it's initial sync with Bitcoind. Out of memory?

Open 20a8vnf30 opened this issue 5 years ago • 9 comments

Electrumx keeps crashing during it's initial sync with Bitcoind, seems memory related. I'm running ElectrumX 1.15.0 on a Raspberry Pi4 with 4GB RAM, Raspbian 4.19.97. Other running applications: Bitcoind 0.20.0 as backend, and LND 0.10.1 taking up some memory. This is the error:

Jul 01 20:50:47 Pi electrumx_server[24345]: INFO:BlockProcessor:our height: 339,907 daemon: 637,215 UTXOs 1,621MB hist 582MB                                                                                       
Jul 01 20:51:47 Pi electrumx_server[24345]: INFO:BlockProcessor:our height: 340,004 daemon: 637,215 UTXOs 1,624MB hist 601MB                                                                                       
Jul 01 20:51:49 Pi electrumx_server[24345]: INFO:DB:flushed filesystem data in 1.89s                                                                                                                               
Jul 01 20:52:04 Pi electrumx_server[24345]: terminate called after throwing an instance of 'std::bad_alloc'                                                                                                        
Jul 01 20:52:04 Pi electrumx_server[24345]:   what():  std::bad_alloc                                                                                                                                              
Jul 01 20:52:05 Pi systemd[1]: electrumx.service: Main process exited, code=killed, status=6/ABRT                                                                                                                  
Jul 01 20:52:05 Pi systemd[1]: electrumx.service: Failed with result 'signal'.                                                                                                                                     
Jul 01 20:52:10 Pi systemd[1]: electrumx.service: Service RestartSec=5s expired, scheduling restart.                                                                                                               
Jul 01 20:52:10 Pi systemd[1]: electrumx.service: Scheduled restart job, restart counter is at 1.                                                                                                                  
Jul 01 20:52:10 Pi systemd[1]: Stopped Electrumx server daemon.

I've set CACHE_MB = 2000, and then 3000, this delays the crash significantly but does not prevent it. During sync I can see the process eating up my RAM over the course of an hour, when the system hits around 3.3g out of 3.81g total RAM the process crashes and the memory is freed up.

The process does not seem to release any RAM it uses even after succesfully flushing filesystem data several times before eventually crashing.

20a8vnf30 avatar Jul 01 '20 19:07 20a8vnf30

Can confirm. I saw the same thing on low memory systems. And by low memory I mean 4G and less. There’s probably some leak in the initial block processing.

whizz avatar Jul 02 '20 14:07 whizz

I notice that with every crash not only do I lose progress up to the most recent flushing, but from before that, is that supposed to happen? I'm not really making any progress at this point and the Pi has been at it for a few days now.

20a8vnf30 avatar Jul 02 '20 20:07 20a8vnf30

@whizz and @RobinTick has this been a recent occurrence, or something that also occurred before the spesmilo fork?

JustinTArthur avatar Jul 03 '20 20:07 JustinTArthur

@JustinTArthur This 4GB Raspberry Pi is the first system I'm running the Spesmilo fork on. I'm succesfully running the older Marodian fork (that still supported Bitcoin) on an x86 Intel NUC with 8GB ram, CACHE_MB is set to 1800 there and I don't remember it crashing this much on initial sync.

20a8vnf30 avatar Jul 03 '20 20:07 20a8vnf30

Oh no, that is nothing new. It’s been this way few years back and is still the case now. I’ll have some time next week so I might finally try to take a look at it.

I remember that I reported this way back in 2017 here https://github.com/kyuupichan/electrumx/issues/179#issue-227452514 but it was closed.

whizz avatar Jul 04 '20 05:07 whizz

Update on progress. Running Spesmilo 0.15.0 on an 8GB RAM system x86 system is a better experience, I saw maybe two process crashes during initial sync, but did achieve full synchronization.

The 4GB RAM Raspberry Pi is still crashing about every hour and is making very slow progress. I tried higher and lower values for CACHE_MB but I'm unable to prevent the process from crashing. With every crash almost an hour of progress is lost, if you made the process commit new data to disk more often that could mitigate the problem somewhat. Though I'm convinced there's something akin to a memory leak happening here which is the underlying issue.

20a8vnf30 avatar Jul 11 '20 14:07 20a8vnf30

I managed to resolve my crashing on the 4GB Raspberry Pi by setting CACHE_MB = 500, deleting the data and starting over from scratch. I have experienced zero crashes thus far. I noticed initial progress was MUCH faster this time, almost like all the crashing during my previous sync attempt had left database issues that kept impeding further progress even after the crashing had stopped.

20a8vnf30 avatar Jul 19 '20 10:07 20a8vnf30

I am also having memory errors during the initial sync with version 1.15.0 and 16 GB RAM. I did not have this problem with an earlier version. I did the first full sync with version 1.13.0 without a crash. Now it is crashing constantly. The error is different than from RobinTick. First I had CACHE_MB = 4096. With CACHE_MB = 2048 it seems better.

Jul 16 01:00:45 server1 electrumx_server[613]: INFO:Prefetcher:cancelled; prefetcher stopping Jul 16 01:00:45 server1 electrumx_server[613]: INFO:SessionManager:closing down server for rpc://127.0.0.1:8000 Jul 16 01:00:45 server1 electrumx_server[613]: INFO:Controller:shutting down Jul 16 01:00:45 server1 electrumx_server[613]: INFO:Controller:shutdown complete Jul 16 01:00:46 server1 electrumx_server[613]: ERROR:electrumx:ElectrumX server terminated abnormally Jul 16 01:00:46 server1 electrumx_server[613]: Traceback (most recent call last): Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx_server", line 35, in main Jul 16 01:00:46 server1 electrumx_server[613]: asyncio.run(controller.run()) Jul 16 01:00:46 server1 electrumx_server[613]: File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run Jul 16 01:00:46 server1 electrumx_server[613]: return loop.run_until_complete(main) Jul 16 01:00:46 server1 electrumx_server[613]: File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete Jul 16 01:00:46 server1 electrumx_server[613]: return future.result() Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/lib/server_base.py", line 125, in run Jul 16 01:00:46 server1 electrumx_server[613]: await server_task Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/lib/server_base.py", line 98, in serve Jul 16 01:00:46 server1 electrumx_server[613]: await self.serve(shutdown_event) Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/server/controller.py", line 134, in serve Jul 16 01:00:46 server1 electrumx_server[613]: await group.spawn(wait_for_catchup()) Jul 16 01:00:46 server1 electrumx_server[613]: File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in aexit Jul 16 01:00:46 server1 electrumx_server[613]: await self.join() Jul 16 01:00:46 server1 electrumx_server[613]: File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join Jul 16 01:00:46 server1 electrumx_server[613]: raise task.exception() Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/server/block_processor.py", line 681, in fetch_and_process_blocks Jul 16 01:00:46 server1 electrumx_server[613]: await group.spawn(self._process_prefetched_blocks()) Jul 16 01:00:46 server1 electrumx_server[613]: File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in aexit Jul 16 01:00:46 server1 electrumx_server[613]: await self.join() Jul 16 01:00:46 server1 electrumx_server[613]: File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join Jul 16 01:00:46 server1 electrumx_server[613]: raise task.exception() Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/server/block_processor.py", line 642, in _process_prefetched_blocks Jul 16 01:00:46 server1 electrumx_server[613]: await self.check_and_advance_blocks(blocks) Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/server/block_processor.py", line 219, in check_and_advance_blocks Jul 16 01:00:46 server1 electrumx_server[613]: await self.run_in_thread_with_lock(self.advance_blocks, blocks) Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/server/block_processor.py", line 202, in run_in_thread_with_lock Jul 16 01:00:46 server1 electrumx_server[613]: return await asyncio.shield(run_in_thread_locked()) Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/server/block_processor.py", line 201, in run_in_thread_locked Jul 16 01:00:46 server1 electrumx_server[613]: return await run_in_thread(func, *args) Jul 16 01:00:46 server1 electrumx_server[613]: File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 68, in run_in_thread Jul 16 01:00:46 server1 electrumx_server[613]: return await get_event_loop().run_in_executor(None, func, *args) Jul 16 01:00:46 server1 electrumx_server[613]: File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run Jul 16 01:00:46 server1 electrumx_server[613]: result = self.fn(*self.args, **self.kwargs) Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/server/block_processor.py", line 399, in advance_blocks Jul 16 01:00:46 server1 electrumx_server[613]: undo_info = self.advance_txs(block.transactions, is_unspendable) Jul 16 01:00:46 server1 electrumx_server[613]: File "/home/electrumx/electrumx/server/block_processor.py", line 448, in advance_txs Jul 16 01:00:46 server1 electrumx_server[613]: hashX + tx_numb + to_le_uint64(txout.value)) Jul 16 01:00:46 server1 electrumx_server[613]: MemoryError Jul 16 01:01:10 server1 systemd[1]: electrumx.service: Succeeded.

Robot1982 avatar Jul 20 '20 05:07 Robot1982

I managed to resolve my crashing on the 4GB Raspberry Pi by setting CACHE_MB = 500, deleting the data and starting over from scratch. I have experienced zero crashes thus far. I noticed initial progress was MUCH faster this time, almost like all the crashing during my previous sync attempt had left database issues that kept impeding further progress even after the crashing had stopped.

I just suggest for next time to add plenty of swap to your Raspberry. Even on cheap USB stick. That will prevent crashes at least. Alternatively you can loop electrumx into a script which will restart it self once crash is encountered. That's what I did and I caught many crashes already, never during sync though (I've got plenty of RAM).

github12101 avatar Oct 14 '20 00:10 github12101