websocket-client run_forever example seems not to work for me

Greetings! I apologize in advance if I am doing something foolish, because Python is not a language I use much, but I find myself needing to build a daemon that communicates with a web socket in order to offer better control of my automated blinds at home, and this project looks like a great solution. However, I can’t seem to figure out how to get my daemon to reconnect when my web socket server goes down for some reason. I read over existing issues, and saw in #580 the instruction that I should open a new issue if I seem to be having problems, so here I am. 😄

I am using the example code in the project README and the only change that I have made is to switch the web socket URL to "ws://localhost:3000/ws" which is where I am running my server as I develop it. When the server is up, everything works great. But as soon as I shut down the server, the Python script immediately exits with the following stack trace:

Traceback (most recent call last):
  File "/Users/james/git/shade/src/python/client.py", line 31, in <module>
    rel.dispatch()
  File "/usr/local/lib/python3.9/site-packages/rel/rel.py", line 205, in dispatch
    registrar.dispatch()
  File "/usr/local/lib/python3.9/site-packages/rel/registrar.py", line 72, in dispatch
    if not self.loop():
  File "/usr/local/lib/python3.9/site-packages/rel/registrar.py", line 81, in loop
    e = self.check_events()
  File "/usr/local/lib/python3.9/site-packages/rel/registrar.py", line 232, in check_events
    self.callback('read', fd)
  File "/usr/local/lib/python3.9/site-packages/rel/registrar.py", line 125, in callback
    self.events[etype][fd].callback()
  File "/usr/local/lib/python3.9/site-packages/rel/listener.py", line 108, in callback
    if not self.cb(*self.args) and not self.persist and self.active:
  File "/usr/local/lib/python3.9/site-packages/websocket/_app.py", line 349, in read
    op_code, frame = self.sock.recv_data_frame(True)
  File "/usr/local/lib/python3.9/site-packages/websocket/_core.py", line 401, in recv_data_frame
    frame = self.recv_frame()
  File "/usr/local/lib/python3.9/site-packages/websocket/_core.py", line 440, in recv_frame
    return self.frame_buffer.recv_frame()
  File "/usr/local/lib/python3.9/site-packages/websocket/_abnf.py", line 338, in recv_frame
    self.recv_header()
  File "/usr/local/lib/python3.9/site-packages/websocket/_abnf.py", line 294, in recv_header
    header = self.recv_strict(2)
  File "/usr/local/lib/python3.9/site-packages/websocket/_abnf.py", line 373, in recv_strict
    bytes_ = self.recv(min(16384, shortage))
  File "/usr/local/lib/python3.9/site-packages/websocket/_core.py", line 524, in _recv
    return recv(self.sock, bufsize)
  File "/usr/local/lib/python3.9/site-packages/websocket/_socket.py", line 122, in recv
    raise WebSocketConnectionClosedException(
websocket._exceptions.WebSocketConnectionClosedException: Connection to remote host was lost.

Is there anything I can do different to enable this great-sounding automatic reconnection behavior, or do I have to write my own code to try to relaunch this daemon whenever it exits?

Jul 04 '22 19:07 brunchboy

I am having exactly the same issue... havent found a fix yet.

Jul 05 '22 18:07 LDN-Calling

I can't tell exactly what is going on so I will list off some ideas to try.

The server might be sending data to the client that is not handled properly by your client implementation. You might need to handle the data the server sends with custom code depending on what the server is doing. You could compare the client run_forever code with this example as another point of reference. The error you shared looks like it may be an error with rel, so you could try the long-lived connection example code from before rel was added due to the discussion in #580, although most likely the reconnection process would not work with this older code. @bubbleboy14 might have some better ideas because I'm in a rush right now.

Jul 09 '22 10:07 engn33r

What the server is doing is shutting down, and I was counting on run_forever to keep trying to reconnect until the server came back up; instead it was crashing. I tried removing rel as you suggested, and the client still exits when the server shuts down, but without a stack trace now (and it sounds like you might have expected this lack of reconnection).

However, without rel in the picture, things are much better because when I try to start the client before the server is up, it fails and exits immediately, while with rel it would hang in a useless state until I killed it manually.

So I guess I can just give up on the idea that this client can retry, and write my own code outside of it using languages I know much better than Python to respawn the client whenever it exits. (I need to be in Python for the web socket connection because I need to invoke another Python library in response to messages on the web socket.)

If someone who is a Python developer could add reliable reconnection to this library it would be great and make it simpler to use, but going forward without rel at least gives me some options to seek stability.

Jul 10 '22 04:07 brunchboy

In any case, thanks for the ideas even why you are busy; if bubbleboy14 has others I will eagerly read them. In case it is of interest or helpful, my code is here: https://github.com/brunchboy/shade/blob/main/src/python/client.py

Jul 10 '22 04:07 brunchboy

One more quick update: removing rel is a huge win. I had already configured launchd to try to keep this client daemon alive, and now that it fails cleanly when the server is down during the initial connection attempt, launchd keeps trying to relaunch my daemon until the server comes back up, at which point it succeeds. So I no longer need to worry about manually checking on the process and intervening.

Jul 10 '22 04:07 brunchboy

Hey @engn33r, @AbexAbe, @brunchboy. Here's a PR:

https://github.com/websocket-client/websocket-client/pull/838

The idea is that by default (unless you pass reconnect=False), run_forever() handles connection failures (including dropped connections and initial failure to connect) by waiting 5 seconds and trying again.

I tested it with run_forever() and run_forever(dispatcher=rel), but not super extensively, so LMK how/whether this works with yall's code!

Jul 11 '22 04:07 bubbleboy14

@bubbleboy14, thanks for the PR, it sounds promising even though I am perfectly content now using launchd to restart my daemon when it exits. I would be happy to test it for you, but need some pointers on how to test a Python library that is in a git branch, since I am not a Python developer. How do you tell Python to use something that isn’t released?

Jul 12 '22 04:07 brunchboy

Hey @brunchboy, thanks! There are a few options for that. The easiest is probably:

cd wherever/you/want

git clone https://github.com/bubbleboy14/websocket-client.git

cd your/project

ln -s wherever/you/want/websocket-client/websocket

that'll symlink my cutty fork into your project. when you're done, you can just rm the sym.

thanks again, much appreciated!

Jul 12 '22 06:07 bubbleboy14

Ok, I got it to work! I messed up my symbolic link attempts in several different ways, but once I properly got it created in the same folder as https://github.com/brunchboy/shade/blob/main/src/python/client.py I am seeing the new reconnect behavior work as desired. Thanks!

Jul 12 '22 14:07 brunchboy

I am wondering if it might be nice to log when it is retrying, so it is clear at a glance from the log file when the remote server is not up? But even without that, it is clearly working, I started with the server down and brought it up and down a few times and the Python client reliably connected when it could and stayed running.

Jul 12 '22 14:07 brunchboy

@brunchboy I added connected/disconnected notifications to setSock()/handleDisconnect() using _logging.warning().

these logs can be turned on with "websocket._logging.enableTrace(True)".

@engn33r is that the right approach, or should we be logging differently in this case?

Jul 12 '22 18:07 bubbleboy14

@bubbleboy14 thanks, I'll take a closer look at the PR soon but it looks good at first glance. I think it would be good to add flexibility instead of hardcoding a 5 second delay in handleDisconnect() before reconnecting. I suspect many applications will want an instant reconnection attempt. CI caught a couple linting issues and 2 unit tests that are failing now, but I'll look more closely soon and it's very possible the tests need rewriting.

And yes, _logging.warning() is the correct approach here.

Jul 15 '22 16:07 engn33r

Good call @engn33r! The "reconnect" kwarg now specifies the sleep() time.

Jul 15 '22 20:07 bubbleboy14

That is a nice way to be able to configure the sleep time. I added a note in the commit that the log message should probably report the actual configured sleep time, rather than having a hard coded message about 5 seconds as it currently does, though.

Ok, I definitely see the new log lines when I call enableTrace(True), but since these are warnings, it seems like there should be a way to see them without all the other verbose trace information? Or am I missing something?

Finally, I notice that if I leave my server down for a while, and then kill the client with a control-C after it has retried a few times, that each time it retries it is adding another stack frame to the exception handler chain. Isn’t that going to eventually cause a stack overflow if the server is down for a long time? Is there a way to retry without accumulating stack frames like that?

Even if I eventually bring my server up and the client connects and proceeds to operate normally, when I later kill it, I see in the stack trace that all those frames from retrying when the server was down are still there.

Jul 17 '22 05:07 brunchboy

Great catch @brunchboy, try the latest version!

I added a stack frame count to the disconnect log, simplified various things, and streamlined (/ made more consistent) a few connection failure cases.

Now, if you use a custom dispatcher (like pyevent or rel), the reconnect logic uses a timeout function that avoids pushing another frame onto the stack.

However, the default (non- custom dispatcher) mode reconnect unfortunately still accumulates stack frames. Any ideas @engn33r ?

Jul 17 '22 20:07 bubbleboy14

That sounds like nice progress. However I can’t test it because I found using rel led to worse outcomes than not (there were circumstances in which my client hung, when it was better for it to die and be respawned by launchd), so I’m in the situation that will still gather stack frames for the moment.

Jul 17 '22 20:07 brunchboy

I mean, I could test whether the reconnection warnings appear even if you don’t turn on tracing, but it doesn’t look like that is supposed to be true yet, if I have been following the work correctly? (I am mostly focused on my own astronomical calculations and features, so it’s possible I missed that.)

Jul 17 '22 20:07 brunchboy

@brunchboy try latest :)

I added an info() function to the _logging module and a level kwarg (default "DEBUG") to enableTrace().

So change your enableTrace to this: "websocket._logging.enableTrace(True, level="INFO")" and you'll only see disconnect/reconnect logs.

Lemme know what you think, thanks for trying!

Jul 17 '22 20:07 bubbleboy14

Yes, that appears to work great, thanks again!

Jul 17 '22 22:07 brunchboy

I merged #838 which fixes this issue

Aug 24 '22 18:08 engn33r

I am using the latest README of the project, and I am getting error on below line

ws.run_forever(dispatcher=rel, reconnect=5) # Set dispatcher to automatic reconnection, 5-second reconnect delay if the connection closed unexpectedly

TypeError: WebSocketApp.run_forever() got an unexpected keyword argument 'reconnect'

Nov 24 '22 07:11 tushargoyal22

I am using the latest README of the project, and I am getting error on below line

ws.run_forever(dispatcher=rel, reconnect=5) # Set dispatcher to automatic reconnection, 5-second reconnect delay if the connection closed unexpectedly

TypeError: WebSocketApp.run_forever() got an unexpected keyword argument 'reconnect'

I am facing the same error.

Mar 03 '23 13:03 reasadazim

@reasadazim @tushargoyal22 that's odd. Are you sure you're using the latest version of websocket-client? Could you please provide a concise test case that reproduces the issue?

Mar 04 '23 03:03 bubbleboy14

@reasadazim @tushargoyal22 that's odd. Are you sure you're using the latest version of websocket-client? Could you please provide a concise test case that reproduces the issue?

Thanks, man. I was using an old version of the webSocket-client. I updated websocket-client package and the error disappears.

Mar 04 '23 07:03 reasadazim