electrum-server icon indicating copy to clipboard operation
electrum-server copied to clipboard

Server doesn't accept new connections after a while

Open bauerj opened this issue 10 years ago • 17 comments

This is the last SSL-connection message in the log:

[27/10/2015-09:36:47] SSL     XX.XXX.88.65:49605    1 2.4.4

It's Tue 27 Oct 21:28:38 CET 2015 right now which means the server didn't receive a new connection for 12 hours.

The server is still connected to irc and thus visible in the peer list.

When I try to connect from the client with SSL I get a "Not connected" message.

When I try to connect with TCP, the server accepts the connection but the client doesn't seem to sync.

electrum-server getinfo still works and reports 171 open sessions.

Connecting with openssl_client fails too:

root@bauerj ~ # openssl s_client -connect electrum.bauerj.eu:50002
connect: Connection refused
connect:errno=111

I can connect to the TCP port but I get no response:

telnet electrum.bauerj.eu 50001
Trying 192.198.88.125...
Connected to electrum.bauerj.eu.
Escape character is '^]'.
{"id": 0, "method": "server.version", "params": []}
{"id": 0, "method": "server.version", "params": []}

The server imports blocks as expected. Restarting makes the server accept connections again.

bauerj avatar Oct 27 '15 20:10 bauerj

this probably means that the ssl thread died. please try to find a traceback in your log and post it here.

ecdsa avatar Oct 27 '15 21:10 ecdsa

The last traceback was 5 days ago (and unrelated).

bauerj avatar Oct 27 '15 21:10 bauerj

[22/10/2015-15:20:34] irc
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/electrumserver/ircthread.py", line 137, in run
    client.process_forever()
  File "build/bdist.linux-x86_64/egg/irc/client.py", line 278, in process_forever
    self.process_once(timeout)
  File "build/bdist.linux-x86_64/egg/irc/client.py", line 259, in process_once
    self.process_data(i)
  File "build/bdist.linux-x86_64/egg/irc/client.py", line 216, in process_data
    c.process_data()
  File "build/bdist.linux-x86_64/egg/irc/client.py", line 573, in process_data

bauerj avatar Oct 27 '15 21:10 bauerj

electrum.bauerj.eu shows this behaviour again. Is there anything I can do to debug this?

I'm not restarting it for now.

bauerj avatar Nov 04 '15 21:11 bauerj

How recent is the codebase you're running and how much RAM do you have in the server and free to use for electrum server? It might be related to #126 just with a different symptom

EagleTM avatar Nov 09 '15 13:11 EagleTM

@EagleTM no, it's different. his ssl thread died.

ecdsa avatar Nov 09 '15 13:11 ecdsa

However I have at least one other server operator who runs with 16 gig and 512 MB swap and runs into this issue. I think it's one possible side effect of memory starvation. The others being std::bad_alloc crash or OOM killer

EagleTM avatar Dec 14 '15 09:12 EagleTM

OK it's independent of the RAM thing after all. It seems under some circumstances the ssl thread dies - pretty early after starting up it seems - while tcp and irc are still working. No idea how to debug for the time being. I'm suspecting openssl is acting up (some security bugfix possbily)

EagleTM avatar Dec 26 '15 11:12 EagleTM

Ignore my last comment. In this particular case it was an expired SSL certificate.

EagleTM avatar Jan 03 '16 12:01 EagleTM

Having the same issue. TCP or SSL stops accepting connections. SSL happened last night and TCP tonight. Checked traceback and no entries about anything stopping.

danny91 avatar Jan 12 '16 19:01 danny91

I always seem to notice this issue, though personally I have looked in awhile, has there been a patch for this yet?

luggs-co avatar Jan 30 '16 04:01 luggs-co

Today I found my TCP unresponsive (although it appears to have been dead for about 23 hours). Here's what I've found so far. "command" will be short for run_electrum_server.py debug "command". transports[0] is my TcpServer thread.

"transports[0]" shows the TcpServer thread is stopped/dead. Server wasn't listening on 50001 at all. Trying to start the thread with "transports[0].run()" failed: File "build/bdist.linux-x86_64/egg/electrumserver/stratum_tcp.py", line 220, in run | IOError: [Errno 2] No such file or directory. Not sure what that's about.

electrum-server sessions | grep ^TCP still shows old sessions, that netstat shows as CLOSE_WAIT. It seems the dead thread doesn't release resources.

"transports.remove(transports[0])" and "transports.append(TcpServer(dispatcher, 'btc.smsys.me', 50001, False, None, None))" recreated the thread. "transports[1].start()" got it to accept TCP sessions again.

I'll try to make some time to debug this. I don't quite understand a bit of what's going on in stratum_tcp.py yet.

valesi avatar Feb 04 '16 01:02 valesi

I have seen the same problem. There is an unhandled exception that is output to the console but not the log files. Here is an example:

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "build/bdist.linux-x86_64/egg/electrumserver/stratum_tcp.py", line 287, in run
    stop_session(fd)
  File "build/bdist.linux-x86_64/egg/electrumserver/stratum_tcp.py", line 164, in stop_session
    session.stop()
  File "build/bdist.linux-x86_64/egg/electrumserver/processor.py", line 237, in stop
    self.dispatcher.remove_session(self)
  File "build/bdist.linux-x86_64/egg/electrumserver/processor.py", line 189, in remove_session
    del self.sessions[key]
KeyError: 'x.x.x.x:51335'

Nothing catches this exception so the whole thread dies.

shsmith avatar Feb 04 '16 14:02 shsmith

@shsmith good catch, I will have a look

ecdsa avatar Feb 04 '16 16:02 ecdsa

that's fixed in 7949a9349fc9bb1eb1917c7bbbc7a05b1dc3e5bc

ecdsa avatar Feb 04 '16 16:02 ecdsa

Found another unhandled exception that kills a thread:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 79, in do_catch_up
    self.header = self.block2header(self.bitcoind('getblock', (self.storage.last_hash,)))
  File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 146, in bitcoind
    raise BaseException(r['error'])
BaseException: {u'message': u"Can't read block from disk", u'code': -32603}

Same as before, this one did not appear in the log but was sent to the console output.

shsmith avatar Feb 09 '16 22:02 shsmith

Yet another unhandled exception that kills the TCP thread:

Exception in thread Thread-8:

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "build/bdist.linux-x86_64/egg/electrumserver/stratum_tcp.py", line 243, in run
    poller.modify(session.raw_connection, mode)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 170, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

shsmith avatar Jul 06 '16 01:07 shsmith