electrumx icon indicating copy to clipboard operation
electrumx copied to clipboard

ElectrumX server hang with "INFO:Prefetcher:cancelled; prefetcher stopping", and then crashed on SIGTERM

Open github12101 opened this issue 5 years ago • 6 comments

I observed that my electrumx_server hang completely and it was not responding to any queries, by clients or by RPC. I inspected the log:

(...)
2020-09-11 12:39:16,695:INFO:ElectrumX:[8329980] SSL 85.26.232.243:34434, 236 total
2020-09-11 12:39:16,823:INFO:ElectrumX:[8329981] SSL 123.25.70.15:65462, 236 total
2020-09-11 12:39:16,826:INFO:ElectrumX:[8329982] SSL 168.0.176.3:49000, 237 total
2020-09-11 12:39:17,147:INFO:ElectrumX:[8329983] SSL 103.224.185.201:57543, 232 total
2020-09-11 12:39:18,725:INFO:Prefetcher:cancelled; prefetcher stopping 
2020-09-11 12:39:18,733:INFO:ElectrumX:[8329984] SSL 122.176.77.218:49194, 232 total
2020-09-11 12:39:18,805:INFO:BlockProcessor:flushing to DB for a clean shutdown...
2020-09-11 12:39:18,806:INFO:ElectrumX:[8329985] SSL 14.188.159.221:59038, 233 total
2020-09-11 12:39:18,806:INFO:ElectrumX:[8329986] SSL 5.127.4.37:35106, 234 total
2020-09-11 12:39:18,806:INFO:ElectrumX:[8329987] SSL 43.241.145.198:17845, 235 total
2020-09-11 12:39:18,807:INFO:ElectrumX:[8329988] SSL 115.74.225.96:57235, 236 total
2020-09-11 12:39:18,813:INFO:SessionManager:closing down server for ssl://all_interfaces:50002
2020-09-11 12:39:18,813:INFO:SessionManager:closing down server for rpc://localhost:8000
2020-09-11 12:39:18,857:INFO:ElectrumX:[940752] disconnected.  Sent 1,694,708 bytes in 29,573 messages
2020-09-11 12:39:18,857:INFO:ElectrumX:[241949] disconnected.  Sent 2,020,329 bytes in 35,213 messages
(...dozens disconnected messages and disconnected whilst throttled)

So by the look of it, electrumx_server has stopped operation, shutdown DB on it's own and just sat there. All clients were disconnected and all activity ceased. Day after I discovered this state. I issued signal 15 SIGTERM from htop to kill hanging process. Then I noticed in the log:

2020-09-12 11:50:13,135:WARNING:Controller:received SIGTERM signal, initiating shutdown
2020-09-12 11:50:13,135:INFO:Controller:shutting down
2020-09-12 11:50:13,135:INFO:Controller:shutdown complete
2020-09-12 11:50:13,160:INFO:ElectrumX:[6417548] disconnected whilst throttled
2020-09-12 11:50:13,160:INFO:ElectrumX:[1721458] disconnected whilst throttled
2020-09-12 11:50:13,160:INFO:ElectrumX:[2920030] disconnected whilst throttled
2020-09-12 11:50:13,160:INFO:ElectrumX:[4422139] disconnected whilst throttled
2020-09-12 11:50:13,160:INFO:ElectrumX:[5994779] disconnected whilst throttled

Last 5 "disconnected whilst throttled" messages. And I thought electrum is completely disconnected, because it's been silent for 24 hours. Straight after these lines we have:

2020-09-12 11:50:13,161:ERROR:electrumx:ElectrumX server terminated abnormally
Traceback (most recent call last):
  File "/home/nv01/.local/bin/electrumx_server", line 35, in main
    asyncio.run(controller.run())
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/lib/server_base.py", line 125, in run
    await server_task
  File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/lib/server_base.py", line 98, in serve
    await self.serve(shutdown_event)
  File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/server/controller.py", line 134, in serve
    await group.spawn(wait_for_catchup())
  File "/home/nv01/.local/lib/python3.7/site-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/home/nv01/.local/lib/python3.7/site-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/server/mempool.py", line 335, in keep_synchronized
    await group.spawn(self._logging(synchronized_event))
  File "/home/nv01/.local/lib/python3.7/site-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/home/nv01/.local/lib/python3.7/site-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/server/mempool.py", line 221, in _refresh_hashes
    hashes = {hex_str_to_hash(hh) for hh in hex_hashes}
  File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/server/mempool.py", line 221, in <setcomp>
    hashes = {hex_str_to_hash(hh) for hh in hex_hashes}
  File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/lib/hash.py", line 60, in hex_str_to_hash
    return bytes(reversed(hex_to_bytes(x)))
ValueError: non-hexadecimal number found in fromhex() arg at position 50

Expected behaviour: electrumx_server should never issue "INFO:Prefetcher:cancelled; prefetcher stopping" and "INFO:BlockProcessor:flushing to DB for a clean shutdown..." unasked. Also, on SIGTERM is should not crash like it did.

I restarted electrumX server and it's working fine. I issued SIGTERM once again right now to see how it behaves in normal way, this is what happens, as expected:

2020-09-12 20:10:46,137:WARNING:Controller:received SIGTERM signal, initiating shutdown
2020-09-12 20:10:46,137:INFO:Controller:shutting down
2020-09-12 20:10:46,138:INFO:Prefetcher:cancelled; prefetcher stopping 
2020-09-12 20:10:46,140:INFO:BlockProcessor:flushing to DB for a clean shutdown...
2020-09-12 20:10:46,141:INFO:SessionManager:closing down server for ssl://all_interfaces:50002
2020-09-12 20:10:46,142:INFO:SessionManager:closing down server for rpc://localhost:8000
2020-09-12 20:10:46,161:INFO:ElectrumX:[110755] disconnected whilst throttled (...dozens of times...)
2020-09-12 20:10:47,156:INFO:Controller:shutdown complete
2020-09-12 20:10:47,157:INFO:electrumx:ElectrumX server terminated normally

Notice that during normal operation message "INFO:Controller:shutdown complete" was not instant like before. Before, electrumx was in "hung" state and there was nothing more to close, so "shutdown complete" message was instant at 0ms. In normal operation, it took 1019ms between "initiating shutdown" and "shutdown complete".

System: OS: Debian GNU/Linux 10 (buster) x86_64 Kernel: 5.7.0-0.bpo.2-amd64 Electrum straight from git, latest commit:

commit 9ca59facfdb50daf063a5813fc9b08eb6fa1b246 (HEAD -> master, origin/master, origin/HEAD)
Merge: 148df48 23633c2
Author: ghost43 <[email protected]>
Date:   Mon Aug 24 17:30:54 2020 +0000

github12101 avatar Sep 12 '20 19:09 github12101

Maybe bitcoind responded to getrawmempool with garbage? https://github.com/spesmilo/electrumx/blob/9ca59facfdb50daf063a5813fc9b08eb6fa1b246/electrumx/server/mempool.py#L218-L221 https://github.com/spesmilo/electrumx/blob/9ca59facfdb50daf063a5813fc9b08eb6fa1b246/electrumx/server/daemon.py#L231-L233

SomberNight avatar Sep 13 '20 00:09 SomberNight

I am not sure how I can check that. Both bitcoind and electrumX have been started 7 days ago. Bitcoind continues to operate, its uptime is 7d 19h, it works just fine but electrumX has crashed like explained above. I am using newest bitcoind binary 0.20.1.

github12101 avatar Sep 13 '20 07:09 github12101

I also seem to be experiencing this exact issue.

atroxes avatar Sep 27 '20 16:09 atroxes

Yep, keeps happening here as well:

Sep 28 11:41:26 INFO:ElectrumX:[74666] disconnected whilst throttled
Sep 28 11:41:26 INFO:ElectrumX:[45710] disconnected whilst throttled
Sep 28 11:41:26 INFO:ElectrumX:[414] disconnected whilst throttled
Sep 28 11:41:26 INFO:ElectrumX:[28204] disconnected whilst throttled
Sep 28 11:41:26 INFO:ElectrumX:[32754] disconnected whilst throttled
Sep 28 11:41:26 INFO:Prefetcher:cancelled; prefetcher stopping
Sep 28 11:41:26 INFO:BlockProcessor:flushing to DB for a clean shutdown...
Sep 28 11:41:26 INFO:Controller:shutting down
Sep 28 11:41:26 INFO:Controller:shutdown complete
Sep 28 11:41:26 ERROR:electrumx:ElectrumX server terminated abnormally

atroxes avatar Sep 28 '20 11:09 atroxes

Anecdotal update: Was using aiorpcX 0.18.3. Forced update to aiorpcX 0.18.4 and no crashes for 8 hours. So far so good.

Update to aiorpcX 0.18.4: pip install -Ifv aiorpcX==0.18.4

atroxes avatar Oct 07 '20 17:10 atroxes

Anecdotal update: Was using aiorpcX 0.18.3. Forced update to aiorpcX 0.18.4 and no crashes for 8 hours. So far so good.

Update to aiorpcX 0.18.4: pip install -Ifv aiorpcX==0.18.4

8 hours is nothing. My electrumx crashes regularly with different errors with uptime from less than 1 day up to 10 days, and I have aiorpcX at 0.18.4. Test for longer period of time.

github12101 avatar Oct 08 '20 00:10 github12101