tinypilot icon indicating copy to clipboard operation
tinypilot copied to clipboard

Pastes stop after dozens to hundreds of characters

Open scirelli opened this issue 3 years ago • 6 comments

Description

Environment

Hardware device: r-pi 0w v1 Broswer: Chrome
When pasting text of any size input on the receiving end randomly stops. I say random because I can't tell what causes it to end the stream. Sometimes I'll be able to paste, I'd say 100 characters, other times it stops after maybe 15-20. I'm not sure why it fails, and there's no feedback in the UI.

What's the behavior that you expect?

Pasting any size text to be streamed to remote computer.

What's happening instead?

Characters stop getting typed after a random amount sent.

What are the steps to reproduce this behavior?

  1. Open TinyPilot in Chrome
  2. Click on menu item to paste text
  3. Paste text
  4. Not all text shows up on remote computer

Screenshots

Nothing to show.

Logs

~I will have to reproduce this again and copy out the logs. It consistently happens so I'll be able to reproduce it again tonight.~

Chrome Console logs: https://logs.tinypilotkvm.com/gZO4HDsJ UI Logs: https://logs.tinypilotkvm.com/l1JI99D8

scirelli avatar Jun 23 '22 13:06 scirelli

It's hard to say what's going on without the logs. Can you see if you can capture logs after it happens?

In addition the the device logs, it would be good to see the browser console logs as well in case it's failing before the browser requests ever reach the TinyPilot device.

mtlynch avatar Jun 24 '22 19:06 mtlynch

Attached

scirelli avatar Jun 24 '22 20:06 scirelli

I think it might have to do with the network. Happened less when the sending device was wired

scirelli avatar Jun 24 '22 21:06 scirelli

I'm not seeing any hints in the logs. There were these lines:

Jun 23 09:50:55 rpi0w-keyboard python[1048]: BrokenPipeError: [Errno 108] Cannot send after transport endpoint shutdown

But those occurred hours before you collected the logs, so I'm assuming that's not part of you attempt to reproduce this.

I tried pasting a long sequence of text from Moby Dick, and I was able to reproduce the issue on both a real device and a dev system. The paste seems to fail after around 2000 characters in dev mode.

I added debug logging to the JS, and I see it emit all the characters from the browser, but the backend doesn't receive them. At a certain point, the backend just stops processing keystroke messages, and there's no error.

My best guess is that we're causing unexpected behavior by flooding the websockets channel, though I can't find anything in socket.io's docs about that. I tried upgrading to the socket.io JS client v4.5.0, and I see the same behavior.

I suspect that we could solve this by buffering the keystrokes instead of sending them immediately, so instead of sending all keystrokes as we see them, we could send a maximum of N keystrokes until we received acknowledgement from the backend that the keystroke was processed successfully.

mtlynch avatar Jun 27 '22 21:06 mtlynch

I think it might have to do with how the fast the receiving computer reads keys strokes. To fast and it does't get the release stroke, or possible bytes get miss aligned? Not sure, but slowing key strokes down seemed to work.

scirelli avatar Jul 08 '22 17:07 scirelli

I think it might have to do with how the fast the receiving computer reads keys strokes. To fast and it does't get the release stroke, or possible bytes get miss aligned? Not sure, but slowing key strokes down seemed to work.

I still think the failure is in the network link. With debug logging enabled, I can see the client (browser) send keystrokes, and the TinyPilot backend just stops receiving them at all, so it doesn't even attempt to send them to the target machine.

My hypothesis is that socket.io has some sort of undocumented maximum buffer, and we're exhausting it. If you're talking about slowing down the keystrokes from the client, then my hypothesis would explain why that works (the buffer doesn't fill up). If you're talking about slowing down keystrokes at the server end, then that would disprove my hypothesis.

mtlynch avatar Jul 08 '22 18:07 mtlynch

A user on the forum has also encountered this issue when pasting larger quantities of text. Their current workaround of batching their text in multiple paste actions works - but is obviously frustrating.

db39 avatar Jun 13 '23 12:06 db39

Just to add some info. I was using this on the Nintendo Switch, the Switch has some arbitrary keystroke entry delay. You can not type fast on the Switch.

scirelli avatar Jun 13 '23 14:06 scirelli

This issue could be just a bug in our socket.io client or server libraries. As a first step, we should upgrade to the latest and greatest versions of both ends and see if that resolves the issue.

mtlynch avatar Jul 06 '23 13:07 mtlynch

One random idea (which I haven’t thought through, though), which might also play nicely together with our plan to make the paste feature a dialog: would it be possible, and potentially more robust, to send the pasted text via the HTTP API, and then convert the characters internally to a sequence of keystrokes that we pass on to the HID interface? The text only wouldn’t appear in the keystroke history then, but I’m not sure how important that is anyway, given that the keystroke history is limited to half a dozen entries. If it’s important to us to give UI feedback after the paste operation, there might be other possibilities to do so, that might also be better suited for longer texts such as Moby Dick 😁

jotaen4tinypilot avatar Jul 07 '23 19:07 jotaen4tinypilot

@jotaen4tinypilot - Oh, yeah. That's a good idea.

When I implemented paste, I debated between doing the text-to-keystroke conversions server-side or client-side, and I did them client-side just for the sake of expediency. But server-side, we could have better testing, and it makes more sense to send everything in one big request than to flood the websockets channel with a million little keystroke requests.

If the fix is as simple as upgrading socket.io libraries, I still think that would be good enough, but if it's a choice between moving the logic server-side and implementing batching logic client-side, we should move the logic server-side.

mtlynch avatar Jul 07 '23 19:07 mtlynch

Quick & dirty UI mockup:

Screenshot 2023-07-07 at 21 57 58

I.e., not echoing the individual characters of pasted text one-by-one in the status bar, but showing a different kind of keystroke indicator, that summarizes the paste action.

jotaen4tinypilot avatar Jul 07 '23 19:07 jotaen4tinypilot

Cool, that looks better than what we're currently doing for paste!

If we move server-side, I think we should probably just skip the character echo.

The main purpose of the character echo is to give fast visual feedback for keystrokes to compensate for the jarring feeling of typing with a delay. It's not as important for pasting text.

The fact that we currently echo pasted text in the keystroke history is just a side-effect of the paste functionality piggybacking on existing plumbing, not something we chose to do deliberately.

mtlynch avatar Jul 07 '23 21:07 mtlynch

Quick update

This issue could be just a bug in our socket.io client or server libraries. As a first step, we should upgrade to the latest and greatest versions of both ends and see if that resolves the issue.

Updating our socket.io libraries on both the client and server side, doesn't seem to fix the issue

  • Server side:
    # requirements.txt
    -Flask-SocketIO==5.0.1
    +Flask-SocketIO==5.3.4
    -python-engineio==4.0.1
    -python-socketio==5.1.0
    +python-engineio==4.5.1
    +python-socketio==5.8.0
    
  • Client side:
    # app/templates/index.html
    -    <script src="/third-party/socket.io/3.1.3/socket.io.min.js"></script>
    +    <script src="/third-party/socket.io/4.7.1/socket.io.min.js"></script>
    

I tried pasting a long sequence of text from Moby Dick, and I was able to reproduce the issue on both a real device and a dev system. The paste seems to fail after around 2000 characters in dev mode.

I enabled socketio logging on the server side and retried the above test:

# app/socket_api.py
-socketio = flask_socketio.SocketIO()
+socketio = flask_socketio.SocketIO(engineio_logger=True, logger=True)

From the logs, you can see the moment the paste starts failing:

2023-07-25 14:40:57.478 engineio.server INFO kLYh-qEY47rUf5lOAAAA: Received packet MESSAGE data 2648["keystroke",{"metaLeft":false,"metaRight":false,"altLeft":false,"altRight":false,"shiftLeft":false,"shiftRight":false,"ctrlLeft":false,"ctrlRight":false,"key":" ","code":"Space"}]
received event "keystroke" from jZE8Oj_LMPZS5xZqAAAB [/]
2023-07-25 14:40:57.478 socketio.server INFO received event "keystroke" from jZE8Oj_LMPZS5xZqAAAB [/]
2023-07-25 14:40:57.491 socket_api      DEBUG [SENSITIVE] received keystroke message: {'metaLeft': False, 'metaRight': False, 'altLeft': False, 'altRight': False, 'shiftLeft': False, 'shiftRight': False, 'ctrlLeft': False, 'ctrlRight': False, 'key': ' ', 'code': 'Space'} [/SENSITIVE]
2023-07-25 14:40:57.493 hid.write       DEBUG [SENSITIVE] writing to HID interface /dev/null: 0x00 0x00 0x2c 0x00 0x00 0x00 0x00 0x00 [/SENSITIVE]
2023-07-25 14:40:57.512 hid.write       DEBUG [SENSITIVE] writing to HID interface /dev/null: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 [/SENSITIVE]
kLYh-qEY47rUf5lOAAAA: Client is gone, closing socket
2023-07-25 14:40:57.544 engineio.server INFO kLYh-qEY47rUf5lOAAAA: Client is gone, closing socket
2023-07-25 14:40:57.548 socket_api      INFO Client jZE8Oj_LMPZS5xZqAAAB disconnected
kLYh-qEY47rUf5lOAAAA: Client is gone, closing socket
2023-07-25 14:40:57.548 engineio.server INFO kLYh-qEY47rUf5lOAAAA: Client is gone, closing socket

<---------------------- PASTE STARTS FAILING NOW ---------------------->

kLYh-qEY47rUf5lOAAAA: Received packet MESSAGE data 2649["keystroke",{"metaLeft":false,"metaRight":false,"altLeft":false,"altRight":false,"shiftLeft":false,"shiftRight":false,"ctrlLeft":false,"ctrlRight":false,"key":"t","code":"KeyT"}]
2023-07-25 14:40:57.549 engineio.server INFO kLYh-qEY47rUf5lOAAAA: Received packet MESSAGE data 2649["keystroke",{"metaLeft":false,"metaRight":false,"altLeft":false,"altRight":false,"shiftLeft":false,"shiftRight":false,"ctrlLeft":false,"ctrlRight":false,"key":"t","code":"KeyT"}]
received event "keystroke" from None [/]
2023-07-25 14:40:57.549 socketio.server INFO received event "keystroke" from None [/]
None is not connected to namespace /
2023-07-25 14:40:57.549 socketio.server WARNING None is not connected to namespace /
kLYh-qEY47rUf5lOAAAA: Received packet MESSAGE data 2650["keystroke",{"metaLeft":false,"metaRight":false,"altLeft":false,"altRight":false,"shiftLeft":false,"shiftRight":false,"ctrlLeft":false,"ctrlRight":false,"key":"o","code":"KeyO"}]
2023-07-25 14:40:57.550 engineio.server INFO kLYh-qEY47rUf5lOAAAA: Received packet MESSAGE data 2650["keystroke",{"metaLeft":false,"metaRight":false,"altLeft":false,"altRight":false,"shiftLeft":false,"shiftRight":false,"ctrlLeft":false,"ctrlRight":false,"key":"o","code":"KeyO"}]
received event "keystroke" from None [/]
2023-07-25 14:40:57.750 socketio.server INFO received event "keystroke" from None [/]
None is not connected to namespace /

From what I understand, the "Client is gone, closing socket" message means that the client-server connection couldn't be maintained the connection is dropped. I suspect that the many keystroke messages are blocking the server from responding to the connection pings.

jdeanwallace avatar Jul 25 '23 14:07 jdeanwallace

Ah, good finding.

Yeah, it seems like we're probably just sending too fast and flooding the channel, though that result still surprises me a little. It doesn't seem like we're sending so quickly that we'd essentially DoS the channel, but I also don't think it's worth digging into deeper.

I think we should proceed with Jan's proposal of sending the characters via HTTP and converting them to keystrokes server-side.

Can you check in the socket.io version change PR, though? Might as well be on latest and greatest.

mtlynch avatar Jul 25 '23 18:07 mtlynch

Can you check in the socket.io version change PR, though? Might as well be on latest and greatest.

Done

  • https://github.com/tiny-pilot/tinypilot/pull/1526
  • https://github.com/tiny-pilot/tinypilot/pull/1527

jdeanwallace avatar Jul 26 '23 12:07 jdeanwallace

@mtlynch

I think we should proceed with Jan's proposal of sending the characters via HTTP and converting them to keystrokes server-side.

I'm not totally convinced that we are reaching some (possibly undocumented) buffer size limit because the default buffer size limit (in polling mode) seems to be 1MB, which is >1million characters.

Similar to @jotaen4tinypilot's idea, instead of sending the pasted text via a HTTP API I propose adding a bulk version of the keystroke endpoint that accepts multiple keystrokes at once instead of just one at a time.

jdeanwallace avatar Jul 26 '23 14:07 jdeanwallace

Yep, I agree with a bulk endpoint to send the full paste in a single request.

I think we should make it a regular HTTP POST rather than a websockets endpoint because HTTP POST is easier to debug and reason about, and we don't need the performance benefit of websockets.

mtlynch avatar Jul 26 '23 15:07 mtlynch

I'm having trouble resolving this issue for when the client sends many keystrokes right after each other (which just happens to be exactly how our paste functionality works): https://github.com/tiny-pilot/tinypilot/blob/330978abc7288b64d61f279fd687816f3cfde3f6/app/static/js/app.js#L258-L260

I've tried a couple things, which didn't work:

  1. Send keystrokes in series, waiting for a callback after each keystroke This made the pasting functionality extremely slow, so not a viable option.
  2. Calling socketio.sleep() in a loop while waiting for a daemon process to finish writing to the HID interface The same bug occurred (i.e. "Client is gone, closing socket")
  3. Putting the keystroke and keyRelease socketio endpoint into their own namespace (in case they were blocking the normal ping/pong keepalive messages) The same bug occurred
  4. Force the ping/pong messages to fire every second to keep the client/server connection alive I could see the ping/pong messages being logged on the server, between all the keystroke messages, but the client/server connection still dies after a while (i.e., same bug occurred)
  5. Don't even write to the HID interface and just call socketio.sleep(1) instead Sometimes I could get through all the pasted keystrokes without the client/server disconnecting, but repeated tests resulted in the same bug still occuring 🤯 I don't get it, calling socketio.sleep() should be unblocking our single threaded server.

jdeanwallace avatar Aug 11 '23 14:08 jdeanwallace

@jdeanwallace - Nothing comes to mind, but can you summarize what we know about the bug so far?

My original hypothesis was that we were flooding the websockets channel, but it seems like that's unlikely given further testing.

Now it seems like something is causing the server to stop responding to websockets / HTTP requests while it's processing a long sequence of keystrokes. Do we know what's causing it to stop responding?

Don't even write to the HID interface and just call socketio.sleep(1) instead

Where did we skip the write? Within the spawned process in hid/write.py? Or before we spawn a new process?

My suspicion is that we're doing something wrong in app/hid/write.py. We're spawning a new process for each HID event, so spawning so many short-lived processes seems like it might interfere with Flask staying responsive.

In particular, this snippet of the docs makes me think we don't fully understand how Flask handles multiple processes:

Flask, as a WSGI application, uses one worker to handle one request/response cycle. When a request comes in to an async view, Flask will start an event loop in a thread, run the view function there, then return the result.

Each request still ties up one worker, even for async views. The upside is that you can run async code within a view, for example to make multiple concurrent database queries, HTTP requests to an external API, etc. However, the number of requests your application can handle at one time will remain the same.

https://flask.palletsprojects.com/en/2.3.x/async-await/#performance

I'm wondering if we need to be using eventlet-specific threading APIs the same way we do to avoid locking up the server in update_logs.py. Like maybe we should be using tpool?

But before we get into trying new fixes, I'd like to just summarize where we are now. Partly because it's a good rubber duck debugging exercise, and partly so @jotaen4tinypilot can weigh in when he gets back.

mtlynch avatar Aug 11 '23 18:08 mtlynch

2023-08-14 – What we know so far

Background about SocketIO behaviour:

  1. Server sends ping event to client every 25s (i.e., ping_interval) and expects a pong event in return.
  2. Server drops client connection when pong event is not received within 5s (i.e., ping_timeout).
  3. Client only detects that is has been disconnected much later then finally triggers its disconnect event handler with the reason being "ping timeout".

TinyPilot behaviour:

  1. When pasting text into the web-UI, the client emits a keystroke event for every character.
  2. Server receives a keystroke event, but only responds until after it has written to the HID interface.
  3. While the server is processing the many keystroke events:
    1. Server stops sending ping events to the client.
    2. Server notices that it hasn't received a pong event within the expected ping_timeout, and drops the client connection
    3. Server ignores all subsequent keystroke events.
  4. After some time the client notices that it has been disconnected and reconnects successfully

What didn't work:

  • On TinyPilot step (2), when receiving a keystroke event then immediately return {'success': True}
    • Exact same behaviour occurs

Main issue:

  • Why is the SocketIO server being blocked from sending ping events to the client when its not even writing to the HID interface or spawning a process/thread?

jdeanwallace avatar Aug 14 '23 17:08 jdeanwallace

@jotaen4tinypilot - Any ideas?

mtlynch avatar Aug 14 '23 19:08 mtlynch

Unfortunately, I don’t have a promising lead in mind here. My naive suspicion would have been what @jdeanwallace proposed in https://github.com/tiny-pilot/tinypilot/pull/1545, but apparently, the issue seems to be elsewhere.

I experimented with this briefly, and I’m wondering whether there is something wrong with how we do threading in general, at least in regards to the SocketIO side of things.

  • The disconnect/stall issue also seems to appear in other multi-threading situations, e.g. when uploading a large disk image. If the client continues to send keystrokes while a disk image upload is in progress in the background, the server also stops to process keystrokes at some point. Later, the client reconnects automatically, i.e. there is a brief “disconnected” state, after which the socket functionality seems to have recovered. So it looks like we are facing the same problem there.
  • I noticed that the “anti-locking” mechanism in hid/write.py is quite slow. It causes a noticeable lag not just when pasting text, but also when typing in a fast manner. I experimented with skipping the multi-process mechanism and calling _write_to_hid_interface_immediately directly instead, which speeds up the HID write significantly (by like 2 or 3 times). Doing so would of course introduce a regression in regards to the locking, but the high latency doesn’t feel like a right tradeoff to me. It also doesn’t fix our issue here, but at least it might be additional indication that there is something wrong with our approach in hid/write.py.

For the lack of other good ideas, I’m thinking the following:

  • It would be worth to dive deeper into the threading setup of Flask and Flask-SocketIO, just to gain a better understanding of why the socket endpoint freezes at some point in situations where the server is busy processing things. For example, why does Flask-SocketIO becomes “blocked” internally, while the rest of the Flask server stays responsive (e.g., the HTTP endpoints)? Or, why is it that socketio.sleep doesn’t have any effect? To me, it looks like there is something off with our threading setup, or with the way Flask and Flask-SocketIO are wired in regards to threading.
  • We should consider to revisit the “anti-locking” mechanism in hid/write.py. I’d intuitively agree with @mtlynch that I’m not sure we are doing the right thing there (see comment above). I’m not sure this is related directly, but I’m wondering whether it somehow contributes to the problem.
  • Seeing that HID writes have quite a high latency anyway (even without the “anti-locking” mechanism), I’m wondering whether we should “mute” the client while a paste operation is in progress. After all, it wouldn’t make sense to send further keyboard or mouse input during a paste operation, as that might interfere, or cause other funny behaviour. This would only be relevant for larger amounts of text, though.

I’m happy to help to look into this, though I’m not sure how to coordinate best, or how much time we want to invest.

jotaen4tinypilot avatar Aug 15 '23 12:08 jotaen4tinypilot

It would be worth to dive deeper into the threading setup of Flask and Flask-SocketIO, just to gain a better understanding of why the socket endpoint freezes at some point in situations where the server is busy processing things. For example, why does Flask-SocketIO becomes “blocked” internally, while the rest of the Flask server stays responsive (e.g., the HTTP endpoints)? Or, why is it that socketio.sleep doesn’t have any effect? To me, it looks like there is something off with our threading setup, or with the way Flask and Flask-SocketIO are wired in regards to threading.

This seems like the best route to me.

@jdeanwallace - Can you create a minimal repro with Python-SocketIO of this issue?

I'm thinking we'd create an app like the following:

  1. App spins up a python-socketio server like TinyPilot's
  2. App serves a single index.html page
  3. index.html page has JS that creates a websockets connection and updates the page if the websockets connection ever breaks
  4. index.html's JS sends websockets messages to the backend that cause some blocking process (e.g., sleep, write a 500 MB file)

Hopefully, that reproduces the error in a way that's pretty portable and simple (i.e., doesn't have to be tested on a real Pi).

I suspect that once we create a minimal repro, the solution will soon follow. If it doesn't, the author of python-socketio does short-term consulting, so I think we could show him the repro, and he would be able to identify the issue pretty quickly.

mtlynch avatar Aug 15 '23 12:08 mtlynch

@mtlynch

Can you create a minimal repro with Python-SocketIO of this issue?

Sure, but why not Flask-SocketIO? They're both maintained by same author and TinyPilot uses Flask-SocketIO (which admittedly does use Python-SocketIO under the hood).

jdeanwallace avatar Aug 15 '23 13:08 jdeanwallace

@jdeanwallace - Right, sorry, that's what I meant. I forgot that there was a layer on top of Python-SocketIO.

mtlynch avatar Aug 15 '23 13:08 mtlynch

@mtlynch - You can find the minimal repro project here: https://github.com/tiny-pilot/tinysocket

I'll leave it to you to assign permissions and licenses.

jdeanwallace avatar Aug 15 '23 15:08 jdeanwallace

@jdeanwallace - Nice! I like the name. Good clean implementation, too.

I'd like to adjust it a little bit to get confidence the issue it reproduces is the same as what we're observing here, so I filed https://github.com/tiny-pilot/tinysocket/issues/2.

mtlynch avatar Aug 15 '23 17:08 mtlynch

2023-08-23 – What we know so far

We created a minimal repro of the reported issue via https://github.com/tiny-pilot/tinysocket, which removed many TinyPilot complexities from the issue and helped us better understand Flask-SocketIO behavior. However, the current repro might just be the upper limit of what Flask-SocketIO can handle. The repro does display similar symptoms as the reported issue (i.e., the client being disconnected then reconnected when sending many WebSocket messages), but under different conditions (i.e., requires 30x more WebSocket messages).

Current understanding:

  • The TinyPilot Flask server is single threaded
  • Both HTTP requests and WebSocket messages are handled by the single threaded Flask server
  • On every keystroke message that the WebSocket server receives, the server spawns a process that blocks the main thread for up to 500ms before responding. https://github.com/tiny-pilot/tinypilot/blob/edb44415699c1a35925933026e9d86a26f032104/app/hid/write.py#L81-L86
  • Looking at the Flask-SocketIO example app, it seems like TinyPilot isn't making proper use of eventlet's coroutines (i.e., green threads) because we never actually call socketio.start_background_task. This suggests that we're blocking the main Flask thread.

Next steps:

  1. As suggested by https://github.com/tiny-pilot/tinysocket/issues/2, we mimic the TinyPilot behavior more closely by spawning processes that block for up to 500ms to repro the issue with less WebSocket messages in order to match the conditions of the reported issue.
  2. Fix/improve the confirmed repro in step (1) by properly utilizing eventlet's green threads

jdeanwallace avatar Aug 23 '23 13:08 jdeanwallace

Thanks, @jdeanwallace! This is a helpful writeup!

mtlynch avatar Aug 23 '23 20:08 mtlynch