WebSocket-for-Python icon indicating copy to clipboard operation
WebSocket-for-Python copied to clipboard

Server crashes with broken pipe if client disconnects ungracefully?

Open EternityForest opened this issue 11 years ago • 23 comments

Hey, Love your library but having a little issue with cherrypy 3.3.0 on python 3.3.3 using chromium Version 31.0.1650.63 Debian jessie/sid (238485) on linux mint.

Basically, i can connect to the websocket, but if i close that tab and reopen it, it does not work. If I call socket.close in the client first, the problem dissapears. the server log shows a broken pipe error here in websocket.py around line 232:

def _write(self, b):
    """
    Trying to prevent a write operation
    on an already closed websocket stream.

    This cannot be bullet proof but hopefully
    will catch almost all use cases.
    """
    if self.terminated or self.sock is None:
        raise RuntimeError("Cannot send on a terminated websocket")
        self.sock.sendall(b)

I found a possible fix, I changed the function to: def _write(self, b): """ Trying to prevent a write operation on an already closed websocket stream.

    This cannot be bullet proof but hopefully
    will catch almost all use cases.
    """
    if self.terminated or self.sock is None:
        raise RuntimeError("Cannot send on a terminated websocket")
    try:
        self.sock.sendall(b)
    except BrokenPipeError:
        pass

And the error dissapears. I assume that ignoring the error is fine because we are closing the whole socket anyway, however I don't know much about the inner workings of ws4py.

EternityForest avatar Nov 05 '14 21:11 EternityForest

Thanks for the write up. Those corner cases are always tricky to reproduce I guess. I will try to find some time to analyze your proposed solution and will merge it if it fits.

Thanks again!

Lawouach avatar Nov 06 '14 08:11 Lawouach

After looking into it a little more, this won't work because BrokenPipeError only exists in python 3. Also, the error always seems to come from the close() function only, as in

Exception in thread Thread-1: Traceback (most recent call last): File "/media/Files/Programming/KaithemAutomation/kaithem/src/thirdparty/ws4py/websocket.py", line 185, in close self._write(self.stream.close(code=code, reason=reason).single(mask=self.stream.always_mask)) File "/media/Files/Programming/KaithemAutomation/kaithem/src/thirdparty/ws4py/websocket.py", line 254, in _write self.sock.sendall(b) File "/usr/lib/python3.3/ssl.py", line 460, in sendall v = self.send(data[count:]) File "/usr/lib/python3.3/ssl.py", line 421, in send v = self._sslobj.write(data) BrokenPipeError: [Errno 32] Broken pipe

The errors are occuring in the websocket manager thread. My current solution is to treat any error that occurs during the call of once() that the manager does as if it returned a None. It would probably be better to not treat all errors the same but for production use I think it makes sense, because if the manager thread is pretty important to keep running.

EternityForest avatar Nov 11 '14 10:11 EternityForest

Not sure if that was what you meant, but my workaround is to simply override the once() function and close the connection if any error occurs running once(). TestWebSocket is my class that inherits from WebSocket:

def once(self):
    try:
        return super(TestWebSocket, self).once()
    except Exception as e:
        print("Closing socket because of error:" + str(e))
        self.close()

Wouldn't a nicer solution to have an "handle_error" function on WebSocket that the user can override if there are any problems? And a try over the once()-call in run()? Its behavior could be to write any unhandled errors to the log and close the connection.

Having crashing the entire server as expected behavior if unhandled errors occurs seems like a quite controversial solution. :-)

nicklasb avatar May 06 '15 14:05 nicklasb

I can write a pull request if you think something like this is a usable solution.

nicklasb avatar May 06 '15 14:05 nicklasb

I do like the handle_error callback handler, specially since it has been often requested.

Lawouach avatar May 06 '15 14:05 Lawouach

Ok, I'll make something.

nicklasb avatar May 06 '15 14:05 nicklasb

Hm. I was going to make this thing now, and when I tried to recreate the issue (with the master branch) i couldn't.

It seems that it had been fixed in some later commit. @EternityForest, can you recreate the problem with the current master?

Also, I am not sure if an error handler is needed now or how it is to be invoked, as there seem to be some other handling somewhere.

nicklasb avatar May 06 '15 19:05 nicklasb

It would appear that a new release would be a good thing. :-)

nicklasb avatar May 06 '15 19:05 nicklasb

Ok, now I managed to recreate it. It was only the client that had started to disconnect more gracefully. I'll be back.

nicklasb avatar May 06 '15 20:05 nicklasb

I got the same issue with a timeout. After that, the server just open/close websocket.

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/ws4py/manager.py", line 310, in run
    if not ws.once():
  File "/usr/local/lib/python2.7/dist-packages/ws4py/websocket.py", line 388, in once
    if not self.process(b):
  File "/usr/local/lib/python2.7/dist-packages/ws4py/websocket.py", line 466, in process
    self._write(s.pong(ping.data))
  File "/usr/local/lib/python2.7/dist-packages/ws4py/websocket.py", line 279, in _write
    self.sock.sendall(b)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
timeout: timed out

tito avatar Mar 22 '17 23:03 tito

IMO, without even trying to reproduce, manager should never stop running due to something bad in a websocket. So catching any exception and force-close the websocket that have an issue should be done in the code to prevent the manager and others websocket to be impacted :)

tito avatar Mar 22 '17 23:03 tito

I have a modified version of the library in use that seems to handle ungraceful disconnect and suspend/wake just fine. I should probably do some more testing with different browsers.

I think there might be a development branch of this repo, or else some pull requests that fix it. I think I might even have submitted some but I'm on mobile and it was a while back.

The code I've been using is here though, and it seems to work fine running 24/7 at a commercial installation albeit without a lot of ungraceful disconnects or anything like that.

https://github.com/EternityForest/KaithemAutomation/tree/master/kaithem/src/thirdparty/ws4py

EternityForest avatar Mar 22 '17 23:03 EternityForest

Thanks for the feedback @EternityForest . I'm currently testing the try/catch in once() in production. I'm not using webbrowser, but ws4py client to ws4py server, using python on android, under very bad network condition. And the server hangs too easily.

I'll check your code, thanks a lot!

tito avatar Mar 22 '17 23:03 tito

I wonder if it wouldn't be helpful to add some of this to the unit tests? Could we create a loopback client that was modular and could​ be plugged into the different servers, that tested things like disconnection, bad data, regression testing that SSL one message long delay queue bug, etc?

EternityForest avatar Mar 23 '17 01:03 EternityForest

Yes, i guess. But some are so network stack related it's hard to get them. After implementing the try catch in once, it got further to another exception:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/ws4py/manager.py", line 312, in run
    fd = ws.sock.fileno()
AttributeError: 'NoneType' object has no attribute 'fileno'

I really wish to have commit access at least to solve thoses issues that can be so easy to get with bad network :(

EDIT: i though you fixed it in your own version, but no

tito avatar Mar 23 '17 18:03 tito

I could add collaborators to this project I guess.

Lawouach avatar Mar 23 '17 18:03 Lawouach

@Lawouach i just wrote you an email as an introduction to a possible collaboration :)

tito avatar Mar 23 '17 18:03 tito

Line 312 in my version doesn't check fileno, and my version has the server thread named "WebsocketManager", so I'm guessing you're using a patched version?

I think my version and the official one have diverged to the point where patches aren't compatible?

Although I'm not seeing what could possibly be setting the sock property to None, unless it has something to do with the Heartbeat thread, but that's only for the client right?

It seems like somehow the once() gets called on an already terminated WebSocket that has no sock() property, and somehow makes it through the polling, or else somehow gets terminated between the poll and the check?

But in my version so far as I know termination can only happen from the manager thread, at least on cherrypy. Is it different in the official version or on other servers?

This is an odd bug.

EternityForest avatar Mar 23 '17 20:03 EternityForest

Here's a proper fork that might be easier to test, that should be an exact copy and paste of what I've been using in production: https://github.com/EternityForest/WebSocket-for-Python

EDIT: Just tested it with Cherrypy, chrome as the client, and the echo_cherrypy_server. If chrome is killed by the SIGKILL when running over SSL, I get: [2017-03-23 14:52:17,012] ERROR Terminating websocket [Local => 127.0.0.1:9000 | Remote => 127.0.0.1:36204] due to exception: SysCallError(-1, 'Unexpected EOF') in once method And then it continues functioning as before. Without SSL, it doesn't even seem to notice that the disconnection didn't happen properly, and prints the normal disconnect message.

EternityForest avatar Mar 23 '17 20:03 EternityForest

I believe this is finally fixed. Please open a new issue if you have any thing going on with latest version (> 0.3.5, use master for now until next version is released.)

tito avatar Mar 24 '17 16:03 tito

Still experiencing the same issue on 0.4.2. My solution is incredibly hacky but connection cleanup works fine. (It's a monkey-patch)

from ws4py.websocket import WebSocket as WebSocket_

class WebSocket(WebSocket_):	
        def _write(self, b):
		"""
		Trying to prevent a write operation
		on an already closed websocket stream.
		This cannot be bullet proof but hopefully
		will catch almost all use cases.
		"""
		if self.terminated or self.sock is None:
			raise RuntimeError("Cannot send on a terminated websocket")

		"""
		If the client disconnects in an unexpected way.
		"""
		try:
			self.sock.sendall(b)
		except BrokenPipeError:
			pass

PhilipTrauner avatar May 26 '17 11:05 PhilipTrauner

Is this error still shutting down the entire server thread? Can we get a full traceback?

EternityForest avatar May 29 '17 08:05 EternityForest

Sometimes a print statement is executed that does not kill the server thread.

Error when terminating the connection: [Errno 32] Broken pipe

Sometimes an exception is raised that does kill the server thread. I'm not able to reproduce it at the moment, will get try to do so as soon as possible.

PhilipTrauner avatar May 29 '17 08:05 PhilipTrauner