electrumx
electrumx copied to clipboard
ElectrumX server hang with "INFO:Prefetcher:cancelled; prefetcher stopping", and then crashed on SIGTERM
I observed that my electrumx_server hang completely and it was not responding to any queries, by clients or by RPC. I inspected the log:
(...)
2020-09-11 12:39:16,695:INFO:ElectrumX:[8329980] SSL 85.26.232.243:34434, 236 total
2020-09-11 12:39:16,823:INFO:ElectrumX:[8329981] SSL 123.25.70.15:65462, 236 total
2020-09-11 12:39:16,826:INFO:ElectrumX:[8329982] SSL 168.0.176.3:49000, 237 total
2020-09-11 12:39:17,147:INFO:ElectrumX:[8329983] SSL 103.224.185.201:57543, 232 total
2020-09-11 12:39:18,725:INFO:Prefetcher:cancelled; prefetcher stopping
2020-09-11 12:39:18,733:INFO:ElectrumX:[8329984] SSL 122.176.77.218:49194, 232 total
2020-09-11 12:39:18,805:INFO:BlockProcessor:flushing to DB for a clean shutdown...
2020-09-11 12:39:18,806:INFO:ElectrumX:[8329985] SSL 14.188.159.221:59038, 233 total
2020-09-11 12:39:18,806:INFO:ElectrumX:[8329986] SSL 5.127.4.37:35106, 234 total
2020-09-11 12:39:18,806:INFO:ElectrumX:[8329987] SSL 43.241.145.198:17845, 235 total
2020-09-11 12:39:18,807:INFO:ElectrumX:[8329988] SSL 115.74.225.96:57235, 236 total
2020-09-11 12:39:18,813:INFO:SessionManager:closing down server for ssl://all_interfaces:50002
2020-09-11 12:39:18,813:INFO:SessionManager:closing down server for rpc://localhost:8000
2020-09-11 12:39:18,857:INFO:ElectrumX:[940752] disconnected. Sent 1,694,708 bytes in 29,573 messages
2020-09-11 12:39:18,857:INFO:ElectrumX:[241949] disconnected. Sent 2,020,329 bytes in 35,213 messages
(...dozens disconnected messages and disconnected whilst throttled)
So by the look of it, electrumx_server has stopped operation, shutdown DB on it's own and just sat there. All clients were disconnected and all activity ceased. Day after I discovered this state. I issued signal 15 SIGTERM from htop to kill hanging process. Then I noticed in the log:
2020-09-12 11:50:13,135:WARNING:Controller:received SIGTERM signal, initiating shutdown
2020-09-12 11:50:13,135:INFO:Controller:shutting down
2020-09-12 11:50:13,135:INFO:Controller:shutdown complete
2020-09-12 11:50:13,160:INFO:ElectrumX:[6417548] disconnected whilst throttled
2020-09-12 11:50:13,160:INFO:ElectrumX:[1721458] disconnected whilst throttled
2020-09-12 11:50:13,160:INFO:ElectrumX:[2920030] disconnected whilst throttled
2020-09-12 11:50:13,160:INFO:ElectrumX:[4422139] disconnected whilst throttled
2020-09-12 11:50:13,160:INFO:ElectrumX:[5994779] disconnected whilst throttled
Last 5 "disconnected whilst throttled" messages. And I thought electrum is completely disconnected, because it's been silent for 24 hours. Straight after these lines we have:
2020-09-12 11:50:13,161:ERROR:electrumx:ElectrumX server terminated abnormally
Traceback (most recent call last):
File "/home/nv01/.local/bin/electrumx_server", line 35, in main
asyncio.run(controller.run())
File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
return future.result()
File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/lib/server_base.py", line 125, in run
await server_task
File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/lib/server_base.py", line 98, in serve
await self.serve(shutdown_event)
File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/server/controller.py", line 134, in serve
await group.spawn(wait_for_catchup())
File "/home/nv01/.local/lib/python3.7/site-packages/aiorpcx/curio.py", line 242, in __aexit__
await self.join()
File "/home/nv01/.local/lib/python3.7/site-packages/aiorpcx/curio.py", line 211, in join
raise task.exception()
File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/server/mempool.py", line 335, in keep_synchronized
await group.spawn(self._logging(synchronized_event))
File "/home/nv01/.local/lib/python3.7/site-packages/aiorpcx/curio.py", line 242, in __aexit__
await self.join()
File "/home/nv01/.local/lib/python3.7/site-packages/aiorpcx/curio.py", line 211, in join
raise task.exception()
File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/server/mempool.py", line 221, in _refresh_hashes
hashes = {hex_str_to_hash(hh) for hh in hex_hashes}
File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/server/mempool.py", line 221, in <setcomp>
hashes = {hex_str_to_hash(hh) for hh in hex_hashes}
File "/home/nv01/.local/lib/python3.7/site-packages/electrumx/lib/hash.py", line 60, in hex_str_to_hash
return bytes(reversed(hex_to_bytes(x)))
ValueError: non-hexadecimal number found in fromhex() arg at position 50
Expected behaviour: electrumx_server should never issue "INFO:Prefetcher:cancelled; prefetcher stopping" and "INFO:BlockProcessor:flushing to DB for a clean shutdown..." unasked. Also, on SIGTERM is should not crash like it did.
I restarted electrumX server and it's working fine. I issued SIGTERM once again right now to see how it behaves in normal way, this is what happens, as expected:
2020-09-12 20:10:46,137:WARNING:Controller:received SIGTERM signal, initiating shutdown
2020-09-12 20:10:46,137:INFO:Controller:shutting down
2020-09-12 20:10:46,138:INFO:Prefetcher:cancelled; prefetcher stopping
2020-09-12 20:10:46,140:INFO:BlockProcessor:flushing to DB for a clean shutdown...
2020-09-12 20:10:46,141:INFO:SessionManager:closing down server for ssl://all_interfaces:50002
2020-09-12 20:10:46,142:INFO:SessionManager:closing down server for rpc://localhost:8000
2020-09-12 20:10:46,161:INFO:ElectrumX:[110755] disconnected whilst throttled (...dozens of times...)
2020-09-12 20:10:47,156:INFO:Controller:shutdown complete
2020-09-12 20:10:47,157:INFO:electrumx:ElectrumX server terminated normally
Notice that during normal operation message "INFO:Controller:shutdown complete" was not instant like before. Before, electrumx was in "hung" state and there was nothing more to close, so "shutdown complete" message was instant at 0ms. In normal operation, it took 1019ms between "initiating shutdown" and "shutdown complete".
System: OS: Debian GNU/Linux 10 (buster) x86_64 Kernel: 5.7.0-0.bpo.2-amd64 Electrum straight from git, latest commit:
commit 9ca59facfdb50daf063a5813fc9b08eb6fa1b246 (HEAD -> master, origin/master, origin/HEAD)
Merge: 148df48 23633c2
Author: ghost43 <[email protected]>
Date: Mon Aug 24 17:30:54 2020 +0000
Maybe bitcoind responded to getrawmempool with garbage?
https://github.com/spesmilo/electrumx/blob/9ca59facfdb50daf063a5813fc9b08eb6fa1b246/electrumx/server/mempool.py#L218-L221
https://github.com/spesmilo/electrumx/blob/9ca59facfdb50daf063a5813fc9b08eb6fa1b246/electrumx/server/daemon.py#L231-L233
I am not sure how I can check that. Both bitcoind and electrumX have been started 7 days ago. Bitcoind continues to operate, its uptime is 7d 19h, it works just fine but electrumX has crashed like explained above. I am using newest bitcoind binary 0.20.1.
I also seem to be experiencing this exact issue.
Yep, keeps happening here as well:
Sep 28 11:41:26 INFO:ElectrumX:[74666] disconnected whilst throttled
Sep 28 11:41:26 INFO:ElectrumX:[45710] disconnected whilst throttled
Sep 28 11:41:26 INFO:ElectrumX:[414] disconnected whilst throttled
Sep 28 11:41:26 INFO:ElectrumX:[28204] disconnected whilst throttled
Sep 28 11:41:26 INFO:ElectrumX:[32754] disconnected whilst throttled
Sep 28 11:41:26 INFO:Prefetcher:cancelled; prefetcher stopping
Sep 28 11:41:26 INFO:BlockProcessor:flushing to DB for a clean shutdown...
Sep 28 11:41:26 INFO:Controller:shutting down
Sep 28 11:41:26 INFO:Controller:shutdown complete
Sep 28 11:41:26 ERROR:electrumx:ElectrumX server terminated abnormally
Anecdotal update: Was using aiorpcX 0.18.3. Forced update to aiorpcX 0.18.4 and no crashes for 8 hours. So far so good.
Update to aiorpcX 0.18.4:
pip install -Ifv aiorpcX==0.18.4
Anecdotal update: Was using aiorpcX 0.18.3. Forced update to aiorpcX 0.18.4 and no crashes for 8 hours. So far so good.
Update to aiorpcX 0.18.4:
pip install -Ifv aiorpcX==0.18.4
8 hours is nothing. My electrumx crashes regularly with different errors with uptime from less than 1 day up to 10 days, and I have aiorpcX at 0.18.4. Test for longer period of time.