etherpad-lite icon indicating copy to clipboard operation
etherpad-lite copied to clipboard

Etherpad keeps reconnecting after pasting large text

Open dbauer91 opened this issue 3 years ago • 24 comments

Describe the bug After copy/pasting a lengthy text from a text document into a pad, the pad keeps on periodically reconnecting, partially throwing warnings and/or errors in the browsers JS console. Trying to write in the pad after pasting causes 502's in the aforementioned console. On reload of the page, the pasted text and any changes after pasting are deleted/have not been saved.

The text used in my case, was from a 'Lorem Ipsum generator'. Line amount does not seem to matter, as roughly above 10000 characters, the behavior stated occurs, whether they are stretched over multiple lines, or a single one.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'https://dark.etherpad.com/' and generate a new pad with a random name.
  2. Copy a text with a length greater than 10000 characters.
  3. Paste the text into the generated pad (Optionally 3.5. Try to write into the pad)
  4. See the reconnected messages, and the JS console warnings/errors

Expected behavior The text is pasted either without error or a warning that this might take a moment due to its length.

Screenshots EP1 EP2

Server (please complete the following information): Both on dark.etherpad.com and a personal server:

  • Etherpad version: 1.8.12
  • OS: Dockerized Debian Buster Slim
  • Node.js version (node --version): 14.15
  • npm version (npm --version): 6.14.9

as well as:

  • Etherpad version: 1.8.9
  • OS: Dockerized Debian Buster Slim
  • Node.js version (node --version): 14.15
  • npm version (npm --version): 6.14.9

Version 1.8.7 of etherpad is seemingly not affected.

Desktop (please complete the following information):

  • OS: Ubuntu 20.04.02
  • Browser: Chrome (88.0.4324.182 64-bit) or Firefox (86.0 64-bit)

Additional Information: Seemingly, copy/pasting smaller pieces of text ins quick succession seems to cause similar errors. Testcase here was ~1200 characters.

dbauer91 avatar Mar 16 '21 12:03 dbauer91

See settings.json for limit change functionality.

JohnMcLear avatar Mar 16 '21 12:03 JohnMcLear

This is intentional change in 1.8.9 due to socketio vulnerability. Our error message handling could be much vastly improved though...

JohnMcLear avatar Mar 16 '21 12:03 JohnMcLear

I'll give it a try tomorrow at latest, keeping you posted Thank you for the quick reply!

dbauer91 avatar Mar 16 '21 12:03 dbauer91

Yep, works perfectly, thanks again for the help and sorry for the inconvenience.

dbauer91 avatar Mar 17 '21 10:03 dbauer91

Leaving open as we still have an action to provide a useful socket disconnect reason.

JohnMcLear avatar Mar 17 '21 10:03 JohnMcLear

Well that's quite annoying. My workflow for the radio show I host is to have a pad with all possible subjects, references… and the show templates, and I cut-paste what's not for the show we'll record to a new pad, and it exceeds 100kb and freaks out etherpad while it worked perfectly for years…

mmuman avatar Mar 18 '21 12:03 mmuman

Actually I think this happens on etherpad as well, at least that's what framapad uses IIRC.

mmuman avatar Mar 18 '21 12:03 mmuman

You have options:

  1. Increase allowed copy/paste amount in your settings.json file.

  2. Use import instead of copy/paste.

  3. Use a template plugin

And yea it's annoying!

JohnMcLear avatar Mar 18 '21 12:03 JohnMcLear

Well I don't own the instance I use, it's from Framasoft. And I bet other people will bump into this on public instances. Importing is more involved, means you have to export, import in the other one, select text to remove… I mean I could automate part of that with a script, but normal people won't. Can't the paste be split in chunks in the code? Probably not great on an atomicity perspective…

mmuman avatar Mar 18 '21 12:03 mmuman

Which setting is it? I noticed that everything is fine with ordinary prose text, but pasting nested lists will still ~~fail~~ need to be split into chunks in order to work in all of my browsers.

dertuxmalwieder avatar Mar 18 '21 12:03 dertuxmalwieder

Look for socketio in settings.json. I'm afk but it's under that.

JohnMcLear avatar Mar 18 '21 13:03 JohnMcLear

It's this section in settings.json (which you might need to copy from settings.json.template, since it's a fairly recent addition):

{
  "socketIo": {
    "maxHttpBufferSize": 10000
  }
}

That matches the "roughly above 10000 characters" in the original post above (though it looks like the actual limit for pasted plaintext is closer to 9 KB, due to message encoding.)

I increased it and restarted my server, and now I'm able to paste larger chunks of text. Thanks for pointing that out, @JohnMcLear.

As a potential improvement in how Etherpad handles this situation, perhaps the server could send the maxHttpBufferSize value to the client, then the client could subdivide the paste operation into multiple buffers? Then server operators could keep maxHttpBufferSize small (to ward off DoS attacks), while allowing users to make large pastes. (Edit: I see now that @mmuman already suggested that above.)

smokris avatar Mar 18 '21 18:03 smokris

Thank you. I increased it to 50000 and it looks like our bi-weekly pads (around 9,000 characters each) can be copy-pased now just fine. :-)

dertuxmalwieder avatar Mar 18 '21 18:03 dertuxmalwieder

Yep that's what we intend to do if we don't update to socketio3. Just haven't had chance as it was a socketio cve...

JohnMcLear avatar Mar 18 '21 19:03 JohnMcLear

We're hitting this on our FreeBSD etherpad instance.

Until a complete and permanent proposal can be designed and implemented, could we in the meantime, catch the oversized request and return an appropriate message to the user/admin, with possibly a breadcrumb, mention or link to the setting to modify for release in the next point release?

koobs avatar Jul 30 '21 02:07 koobs

I agree with @koobs here.

dertuxmalwieder avatar Jul 30 '21 12:07 dertuxmalwieder

Idea/braindump: To improve on the simplest UX improvement of catching the condition and returning (once) a useful error message, could we, again until a more complete design proposal is implemented ...

  • Catch the condition, and
    • Paste/return/process the value up to maxHttpBufferSize with an error (Default)
  • Optionally, include a user setting for handling the condition, like: HttpBufferSizeExceededHandler = error | null, where
    • empty value processes the value up to value up to maxHttpBufferSize, without an error

koobs avatar Jul 31 '21 01:07 koobs

Unfortunately, the socket.io server-side code silently disconnects the client when the message size is exceeded—it does not send an error message back to the client. There's no robust way for the client to distinguish a disconnect due to a large message from an ordinary network glitch. Catching the condition isn't trivial, so improving the UX isn't trivial.

Possible ways to detect the condition:

  • The server tells the client about the message size limit when the client connects. The client uses a custom socket.io parser (that wraps the default parser) to inspect the size of each serialized message before it hits the wire.
  • Add state on the server so that the server can notify the client about the dropped large message when it reconnects. Knowing when to clean up that state could be tricky.
  • Track disconnects on the client and add a heuristic to interrupt the infinite loop of disconnects.

I think I prefer the first option.

rhansen avatar Jul 31 '21 02:07 rhansen

First option sounds nice. Could that be in principle extended to split/chunk a message that exceeds the maxHttpBufferSize into multiple smaller than the max chunks?

Edit: Or indeed have the client chunk messages above a certain size thereby not requiring the server to set a higher than default maxHttpBufferSize at all

koobs avatar Jul 31 '21 23:07 koobs

First option sounds nice. Could that be in principle extended to split/chunk a message that exceeds the maxHttpBufferSize into multiple smaller than the max chunks?

Not really. I see two possible approaches to fixing this issue:

  • Break up the large message into fragments that are reassembled on the server before being processed. This is undesirable because it defeats the purpose of the message size limit (it leaves the server vulnerable to DoS). It's also difficult to implement in a robust manner.
  • Break up the operational transformation (OT) changeset into multiple changesets and send them one at a time. This requires domain-specific knowledge that I currently lack.

If it were up to me, I'd probably address this while switching from OT to CRDT (e.g., Yjs), but that would be a huge task.

For the foreseeable future I only see us marginally improving the UX by displaying an error that forces the user to refresh the pad and try again. Even this is turning out to be more difficult than I had expected: Before we can wrap the socket.io-parser we have to pick one of these not-so-small yaks and shave it:

  • Add alias support to require-kernel and yajsml so that require('foo') can load foo's main .js file. (This seemed like the least unpleasant option, so I started working on it. See ether/etherpad-yajsml#13 and ether/etherpad-require-kernel#15.)
  • Migrate away from require-kernel and yajsml (@webzwo0i has state on this; see #4820).
  • Use something like browserify to bundle socket.io-parser for use in browsers.
  • Upgrade to socket.io 3.x or later (see #4916).

rhansen avatar Aug 09 '21 10:08 rhansen

Slightly related to this: #5180 will enable changing the socket limit on Docker containers via an env variable, i.e. without having to rebuild the image.

JustAnotherArchivist avatar Sep 16 '21 20:09 JustAnotherArchivist

I'd like to add that I experienced this as something more like a data-loss event -- after pasting, I close the tab and shared the pad with someone else, and they saw a blank document. Anything else I would have typed would also have been lost. It seems pretty important to at least have some kind of indicator of "this text hasn't been synced". (Which might be a different issue, I suppose.)

timmc avatar Sep 06 '22 13:09 timmc

I agree with that, we're hosting an instance that's primarily used by authors. We're regularly getting reports that the pads are loosing their content, which until now all turned out to be too long copy-pasted texts (from external writing software like Scrivener) and usually the users wanted to share it for proofreading and copy editing (which works well due to author colors).

oe1rfc avatar Sep 06 '22 15:09 oe1rfc

Agreed. Hit this when copy/pasting content from one etherpad to another. Not only did the new pad silently fail to copy the text, but after deleting the text from the original etherpad, I couldn't get it back with ctrl-z (it looked like it was back, but again silently failed to restore the original pad). Not only that, but some how the version history doesn't seem to be working properly. The only reason I didn't lose the entire pad was because I was able to get the data in my browser window and make an off-line copy. This is far more serious than a minor UI bug.

markhpc avatar Jan 12 '23 15:01 markhpc