remi
remi copied to clipboard
Freeze after wifi drop
Hello,
at first I have to say this project is really awesome and I love it!
Unfortunatelly I have one problem. If I open web page on mobile phone and then disable wifi on phone, server hangs and is impossible to open web page from another device.
remi.server INFO Started httpserver http://127.0.0.1:8081/ remi.request INFO Authenticating 192.168.0.59 - - [02/Dec/2018 22:35:59] "GET / HTTP/1.1" 401 - remi.request INFO built UI (path=/) remi.request DEBUG get: / 192.168.0.59 - - [02/Dec/2018 22:35:59] "GET / HTTP/1.1" 200 - remi.server.ws INFO connection established: ('192.168.0.59', 52236) remi.server.ws DEBUG handle remi.server.ws DEBUG handshake remi.server.ws INFO handshake complete remi.server.ws DEBUG send_message: 0306000353... -> ('192.168.0.59', 52236) remi.server.ws DEBUG send_message: 3... -> ('192.168.0.59', 52236) remi.server.ws DEBUG on_message: connected remi.server.ws DEBUG send_message: 3... -> ('192.168.0.59', 52236) remi.server.ws DEBUG on_message: callback remi.request DEBUG App.onload event occurred remi.server.ws DEBUG send_message: 3... -> ('192.168.0.59', 52236) remi.server.ws DEBUG on_message: callback remi.request DEBUG App.onpageshow event occurred remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236) remi.request DEBUG get: /favicon.ico remi.request DEBUG get: /favicon.ico remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236) remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236) remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236) remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236)
Somewhere here I disable wifi on phone.
remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236) remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236) remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236) remi.server.ws DEBUG send_message: 1306009182... -> ('192.168.0.59', 52236)
At this point it hangs forever. Sometimes if I reenable wifi on phone, this line shows up and then server acts normally.
remi.server.ws DEBUG ws ending websocket service
What can be wrong?
Thank you.
Hello @Navarro11 , thank you for reporting. Which version are you using? Can you show here your script?
I will be unable to debug this for about 3 days. Can you please make a test with one of the examples? does it happen the same with them? check that the parameter multiple_instances have to be the same, change it in the example to meet the value of your script
I've tried widgets_overview_app.py and it is same. After some time after wifi drop on mobile phone, webpage is not accessible from any device.
@Navarro11 you found an important bug in this version. I will debug and prepare a fix in some days. Waiting for the fix you can use the previous version 1.2.2 .
@Navarro11 I pushed a possible fix on master branch. Can you please install it directly from GitHub (not pypi) and let me know if this fixes the problem?
@dddomodossola I'm really sorry but this didn't help. Behaviour after disconnecting phone is the same.
@Navarro11 thank you for the test.
Can you please confirm me you updated your remi version with the following command?
pip install git+https://github.com/dddomodossola/remi.git
Furthermore, which python version are you using?
I don't have git installed so I use zip file:
pip3 install https://github.com/dddomodossola/remi/archive/master.zip
I use python 3.5.3. I tried it with Python 2.7 as well and nothing changed.
@Navarro11 I will do some additional test and provide you a solution ;-)
@Navarro11 I tried to reproduce your problem with different python version but it works correctly. do you have further info that can help?
Hello @dddomodossola, I thought wifi router is the problem, but I tried connect via hotspot on mobile phone and the behaviour is still the same. I created hotspot on phone and connect raspberry and pc to it. Via ssh on phone I ran script on raspberry. I opened webpage on pc and after while I blocked pc on hotspot, so pc was force disconnected. After that webpage opened on phone stopped updating and refresh was not successful. Then I tried cancel script on ssh via ctrl+c, but there popped up some error message in server.py that "endpoint is not connected" or something like this, and script was not cancelled so I had to restart ssh.
@Navarro11 thank you a lot for helping me investigating on this problem. Which command you executed to start the script? have you used nohup?
I run script simply python3 main.py.
@Navarro11 you should run it by nohup python3 main.py & . this allows to get the ssh connection closed, preventing it to stop the started applications. The problem could be caused by this. Please give it a try when you can. ;-)
@dddomodossola Of course I know this :). This is not the problem. I wrote it because when any device is not force disconnected, script can be canceled by ctrl+c without problem, but after force disconnect, script is not canceled so I think script is hanging somewhere and therefore webpage stops work. I will make video this afternoon with pc and phone screen to better undestand :).
@Navarro11 I mean that, if you don't use nohup, after wifi disconnection, it could happen that the remi script becames not reachable, because the server gets stopped. however ok, from the video I can better understand.
@dddomodossola Here is the video. There are few subtitles, I hope it will help.
@dddomodossola this smells like long socket timeout (windows? or another library messing with sys.setsockettimeout) combined with remi pinging connected clients in a loop, which means that one blocking client can hang the server.
@Navarro11 thank you for the great video. @nzjrs that's possible, maybe the long socket timeout blocks the thead lock. I need to test it further
@Navarro11 @nzjrs I was able to reproduce the problem. It is caused exactly as @nzjrs says. A long socket timeout freezes the application. The most common situation happens when a client reconnects to the server, and that should be fixed now. You can test it @Navarro11 , now it should work for you.
However I plan to remake the websocket implementation to make it nonblocking.
@dddomodossola Unfortunately, nothing changed. There has to be something wrong with my raspberry... Anyway thank you very much for your effort :)
@Navarro11 I doubt it could be caused by your raspberrypi. I will do some additional tests soon. Thank you for your collaboration ;-)
Hi Davide,
I'm currently running remi==2019.11, and I confirm that this bug is present in this version. Each time one of the connected clients has a connection break, the server totally hangs a few seconds later. I say "a few seconds" but after some investigations I realized that its not related to time (like a timeout), but rather to the amount of message sent (or received, I don't really know).
I planed to send you more precise information and test results on Thursday. However before that -- as my remi version is almost 1 year old -- I would like to know if there is any chance that you already solved the issue in new versions ?
Many thanks.
@batzkass I don't know. Can you please send me an example script to reproduce the problem?
Sure, on thursday. It should rather be a list of steps to reproduce based on your example scripts rather than a simple script.
Le mar. 6 oct. 2020 à 18:26, Davide Rosa [email protected] a écrit :
@batzkass https://github.com/batzkass I don't know. Can you please send me an example script to reproduce the problem?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dddomodossola/remi/issues/253#issuecomment-704396472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEHQBGFEVCQHONXCTCMALSJNALXANCNFSM4GHWQVEA .
Hi,
I finally found some time to work on that today. I had to write a tiny script which basically shows a series of 2000 random integers, and changes the 2000 values with an idle function every 0.2s (see http://kneib.fr/helloworld_app.py). I use the last remi version available on pypi.
I made a video of the bug here: https://www.youtube.com/watch?v=_Ep8PV37xmk
On my PC, I launch the server and I use firefox to connect to it. Then I use my phone to connect on the server though wifi (video streaming from my phone is done through usb). Here everything is fine, ints are updating. I then brutally cut off the wifi from my phone and then after a few seconds the server freezes forever (the 2000 integers are not updated anymore, and no new connections are possible). I also tried to connect my phone to the server before my pc and the result is the same. Note that, once freezed, the first ctrl+c command will show something in the log, then after that nothing more happens (need to kill the process).
You may wonder why I put 2000 ints and not just 1. This is because I realized that the time it takes between the wifi shut down and the server freeze highly depends on the amount of data sent during this period. In fact I think that the server rather freezes after a certain amount of data sent from the server though the websocket. For example, in the video, it takes about 8s, but when I put 1000 ints I get about 15s, and so on.
To my understanding, remi can't deal with websockets that are brutally not responding. Like if a sending buffer was slowly filled until it is totally full. I had to use the wifi trick to reproduce a real connection failure, as killing (with SIGKILL on linux) a browser makes the OS to gently close the websocket and remi is happy with that.
In fact this issue haunts me since I use remi (few years) as sometimes the client PCs are connected through 3G/4G, or experience wifi drops. For me it was just "remi server crashes sometimes, its not very stable", but I just found this issue on gitlab now and decided to further investigate.
Don't hesitate to tell me if I can help for debugging. Many thanks for remi and for your help.
@batzkass I tried your script different times but I can't reproduce the problem. However from your video I can see a "broken pipe" error caused by the forced disconnection. I see that after wifi disconnection the messages continue to be spawn on console, and so I suppose it is not caused by a long socket timeout. Maybe we just need to catch the broken pipe error. I will test it on a linux machine to reproduce the problem and hopefully fix.
Well, linux could be the common point with @Navarro11 who was running the server on a raspberry, so probably on raspbian. The "broken pipe" error occurs when I press ctrl-c, I think this specific error may rightly be caused by ctrl-c. To my opinion it is rather that the server tries to send messages to a disconnected client, hence progressively fills in its sending buffer. If I'm right with this scenario, maybe the solution could be to close the connection for clients that doesn't acknowledge the messages after a certain time.
@batzkass this could be a good solution, I will test on raspbian today ;-)