irc-core Line Splitting

I've noticed that when I send overlong lines, they show up in (non-native) logs as having been split into two. I presume this is something glirc is doing in the background due to IRC line length limits, and it's then hiding the split in the display and the native logs. That's fair, but I'm not satisfied with the current behaviour: it shouldn't be splitting in the middle of a word.

Proposed solutions:

Split on the last space instead.
Represent the split in the edit box as, say, | or $^, so the lines can be adjusted around it.
Just stop taking input, forcing the second line to actually be written in a second line.
Soften 3 by continuing to accept input, instead highlighting the post-limit text in red.

I believe the second option is the best. The first doesn't offer much control to the user, and could still split in places they don't want it to. The third and fourth do, but force an awkward UX since you have to navigate through the history and send the lines manually. There's too much potential to accidentally duplicate, omit or reorder lines.

Version: glirc-2.39.0.1

Mar 07 '23 10:03 LSLeary

It's pretty hard to know when you should split a line and how long the line can be. The maximum message length varies by the channel you're in and what your hostname currently is. This can even change while you're connected due to a message from the server. The message limits are measured in bytes and have to be computed after conversion to UTF8. The logic for this currently happens just before the message goes out the network layer.

I've thought about better solutions, but doing something nicer would either require a pretty big hack, underestimating max line length, or a lot of refactoring.

I'd entertain a a discussion and a PR, but I don't expect to be able to address this very soon.

Jul 03 '23 01:07 glguy

There may be room for improvement here without resorting to any of the above.

The current logic for message splitting appears to live in Client.State.Network.utf8ChunksOf. AIUI it iterates over the indices of UTF-8 code points, gets the last index that is less than the maximum length, and then repeats with the remainder of the text as long as that remainder is non-empty.

A possible naive improvement on this might be to store the index of the last space, comma, or hyphen while iterating over byte indices, and split at the index after that where possible. It would be fairly conservative and it would be nicer to split on sentence boundaries where possible, but at least it doesn't seem to me like this approach would cause especially large headaches to implement.

I'll try implementing this and see how it goes.

Aug 28 '23 00:08 TheDaemoness

OK. My initial thought is that we shouldn't shorten the message to under 400 characters, but if we found a period in the last 110 bytes, or something like that, we could tolerate breaking a little early.

Big bonus points if you run this logic in the editor somehow so we can visually indicate where message breaks will occur!!

Aug 28 '23 15:08 glguy

suggestion: when sending a message, if it's too long for the network, raise an exception in the network layer and handle it somewhere higher up where the Text (or whatever) representation is still being used. keep cutting words until it goes through. If a single word is too long, start cutting characters. If there's nothing left to cut, raise the exception.

of course something other than exceptions can work too, this is just an example.

Dec 31 '23 16:12 cheater

irc-core irc-core copied to clipboard

Line Splitting

irc-core
irc-core copied to clipboard