helix icon indicating copy to clipboard operation
helix copied to clipboard

Support soft wrap

Open kellytk opened this issue 4 years ago • 28 comments
trafficstars

As discussed on Matrix.

The default behavior of Kakoune to not line wrap is confusing. Typically I'd appreciate the austere design, where until a newline is encountered, do not wrap. However I think this is an exceptional case, because there are constraints on the doc outside of the user's control, namely the viewport. There is a 'size' constraint imposed on the user, and given that, I'd default to wrapping in Helix.

kellytk avatar Jun 06 '21 03:06 kellytk

I'd tentatively like to take this on, based on my experience with Led. But in case anyone gets to it before me, I want to leave some notes about Led's approach.

Here's a short demo of Led in the face of large files and crazy-long lines:

http://perm.cessen.com/2021/helix/led_demo.mp4

The core principle that enables this is to always do things in ways that keep calculations local to a given area of the text. The specific techniques I've used in Led are:

  1. The view position is specified by a simple char-offset into the text, not by visual line. This lets the display code jump directly to the vicinity of the content that should be on screen, without needing any information about how it will be displayed. The wrapping etc. can then be done locally afterwards, in that vicinity, and everything positioned on screen based on that. This is efficient enough to be done on-the-fly every time it's needed (including for e.g. vertical cursor movement calculations), with no caching needed. However, with this alone it still needs to calculate soft-wrapping starting at the beginning of the line that contains the content, which obviously won't work in real time for very long lines. And that brings us to...
  2. Chunking overly-long lines. This can be done with only local calculations as well, including a search for a good break point (e.g. white space, or at least grapheme boundary) in the vicinity of the proposed chunk boundary. A soft line break is then made at the end of every chunk. This places an upper-bound on how far back the soft wrapping code needs to go: at worst, it's the length of a chunk. (In Led, the chunk size is hard-coded to 4096 chars, but it would be easy to make configurable.)

The down sides to this approach are that A. the editor has no concept of absolute visual vertical position for use with e.g. scroll bars, and B. there are periodic soft line breaks at the chunk boundaries of over-long lines.

I don't think issue A is a problem for a console editor. And even for a GUI editor it just slightly changes the meaning of the scroll bar: you're scrolling through content rather than visual lines. Emacs actually calculates view positions this way as well, and it seems to work fine in the GUI version. It's a very subtle difference in scroll-bar behavior in most files.

Issue B is a little more annoying, but it also only kicks in for extreme situations. And those are the same situations where you're starting to make the choice between "perfect wrapping and unusable editor" or "imperfect wrapping and usable editor". Helix could also set the chunk size far higher, so it really only kicks in when it absolutely needs to.

cessen avatar Jun 12 '21 06:06 cessen

Just for my information, what's the progress here ?

bestouff avatar Nov 26 '21 21:11 bestouff

Realistically, at this point I doubt I'll get around to this any time soon. Most of my time and motivation is directed at other projects at the moment, and I expect that to be the case for a while.

If someone else wants to take this on, that would be great. I'd be happy to provide some guidance as time allows. Although this probably isn't something for a first-time contributor.

cessen avatar Nov 29 '21 07:11 cessen

I'll try to tackle this.

kirawi avatar Mar 20 '22 20:03 kirawi

I'll repeat here what I said on the matrix channel:

I would recommend doing a first implementation that ignores the chunking aspect of things, since it will work fine without that for the large majority of files anyway (chunking is only needed for very long lines). And get that working first in one PR. This corresponds to point 1 in my description up-thread.

And then after that's working, go back and implement the chunking of very long lines (point 2 up-thread) in a separate PR.

cessen avatar Mar 21 '22 17:03 cessen

That might require adding new commands to be able to move down by display line (like gk and gj in vim) vs actual line.

antoyo avatar Mar 21 '22 17:03 antoyo

Here's an example of how I think soft-wrapping should work, so wrapping doesn't break words, it's obvious that something is soft-wrapped and indentation is preserved Screenshot 2022-04-08--14:35--351

thomas-profitt avatar Apr 08 '22 20:04 thomas-profitt

Nice but there mau be problems with already-much-indented files, where wrapping will create a thin column stuck on the right.

bestouff avatar Apr 09 '22 10:04 bestouff

@thomas-profitt I agree that soft wrapping should (optionally) preserve indentation. IMO it's really hard to read soft-wrapped source code without that feature.

@bestouff While that can happen, I still think indentation-preserving soft wrap is the better default for source code files. But certainly, it should be something that can be disabled by the user. My implementation in Led actually has two settings for soft wrap:

  1. bool: preserve indentation on wrapping or not.
  2. int: additional number of spaces to indent wrapped lines.

Both options can be mixed and matched. This provides a lot of flexibility in behavior for the user, and isn't especially difficult to implement.

cessen avatar Apr 09 '22 18:04 cessen

I think it'd be nice if there was a somewhat convenient keybind to toggle this. I personally have come to quite like hard wrapping by default, but sometimes I want to quickly soft wrap as I am writing something or if I happen to open a file I didn't expect to be a long single line. After I'm done reading/writing, I'd want to toggle it back.

dylrich avatar Apr 20 '22 04:04 dylrich

With #2128 (hopefully) closing soon, I'd like to start talking about possible strategies for supporting soft wrap.

The first thing I'm thinking is that "live" hard wrap (as opposed to the patch in #2128 , which is triggered by a command) and soft wrap (which is implicitly live) are kind of the same thing. This is especially true if we want to support preserving indentation and maybe even comments in soft wrap. Though, maybe these features diverge due to one actually modifying the text and the other only the viewport. But maybe those two functions can share a base implementation of "dynamic wrapping" (?).

If this is the route we want to go, we may want to investigate patching the textwrap library to support something like incremental wrapping, where instead of returning a Vec<Cow<str>>, it gives back some kind of iterator over the changed lines. Maybe an impl Iterator<Cow<str>> or something similar. It would be great if the hypothetical incremental wrap command also supported a char range where we could specify that nothing before or after that range should change.


@kirawi I just saw that you've already started on a PR for this. What do you think of this direction? I'm mostly asking about using the textwrap crate along with some kind of patch to allow for incremental wrapping. I'm not at all familiar with how the current viewport implementation works, so maybe this is a bad idea.

EDIT: Now that I think about it more, I'm not sure how well textwrap would work on code. I was mostly thinking about prose.

vlmutolo avatar Apr 26 '22 02:04 vlmutolo

If this is the route we want to go, we may want to investigate patching the textwrap library to support something like incremental wrapping, where instead of returning a Vec<Cow<str>>, it gives back some kind of iterator over the changed lines. Maybe an impl Iterator<Cow<str>> or something similar. It would be great if the hypothetical incremental wrap command also supported a char range where we could specify that nothing before or after that range should change.

Textwrap is actually very line-oriented: that is, it wraps multiple lines by simply wrapping them one by one. This means that you as a caller can save a lot of work if you don't ask it to wrap lines which you know haven't changed.

Perhaps I misunderstood and what you are after is a way to get back the output line-by-line? So you feed a single 200 character line to Textwrap and it gives you back an iterator which will yield the 2-3 wrapped lines? I used to have such a design, but it was complicated to make all features work together... so I changed it to return a Vec instead for simplicity.

Since it's very very fast to wrap a single line of text (I measure some 40 microseconds to wrap a line with 800 characters), I figured returning the fully wrapped result would be okay.

But I would be very happy to hear feedback on this from real-world applications :smile:

mgeisler avatar Jun 03 '22 17:06 mgeisler

Now that I think about it more, I'm not sure how well textwrap would work on code. I was mostly thinking about prose.

Yeah. I suspect the use cases here are different enough that textwrap probably doesn't make sense for Helix. The easy parts of text wrapping are... well, easy, and don't (IMO) justify a dependency. And the hard parts of text wrapping (how we handle indentation, what are considered valid break points, etc.) are also the places where we're likely to differ from textwrap anyway.

It's also worth noting that in an editor the text wrapping code isn't just for display, it's also used for cursor movement, knowing where to place inline compiler errors, and anything else that needs to query the relationship between text offsets and screen position. Those kinds of queries could potentially be built on top of something like textwrap, but it would probably involve a fair bit of shoehorning, and it's yet another thing that differs in our use case.

(This is in no way a knock against textwrap, btw. Being targeted in your use cases rather than trying to be everything to everyone often makes for better, not worse, libraries.)

cessen avatar Jun 04 '22 05:06 cessen

We already merged/released support for the "reflow" command using the textwrap crate. I think the addition of a (relatively small) dependency was well worth it in that case. The purpose of "reflow" is to take prose-like text, such as comments and markdown, and hard-wrap it to a given line width.

Textwrap does a far better job at this than what I had proposed in the PR originally. For that use case, if we wanted to get the same quality reflow as what textwrap provides, we'd basically have to re-implement textwrap.

For other use cases, like soft wrapping the displayed text, textwrap may or may not be the right fit. I'm not sure. But we already have it in the project to use if/where it makes sense.

vlmutolo avatar Jun 04 '22 17:06 vlmutolo

Ah, yeah, that makes sense.

And again, I'm not knocking textwrap at all here. In fact, I was pleasantly surprised to see that e.g. it can be configured to be zero-dependency (I'm used to library crates pulling in the world, which makes me hesitant to pull them in as dependencies even if they otherwise perfectly match my use case). And the optional dependencies it does have seem carefully chosen and worth the features they enable. I think that all speaks well of the engineering sensibilities of the author(s).

I'm just skeptical if it's the right fit for soft wrapping in Helix, for the reasons I outlined above.

cessen avatar Jun 04 '22 19:06 cessen

I don't think it would be the best choice for soft wrapping because graphemes would be iterated over twice: once to calculate the wrapping, and again to render the text. Though that might not be avoidable either way, now that I think about it...

kirawi avatar Jun 04 '22 22:06 kirawi

Hi @cessen, this comment became a bit of an essay... I hope it's useful still :-)

The easy parts of text wrapping are... well, easy, and don't (IMO) justify a dependency.

Yeah, I agree: the simple case is simple. When you know the parameters of your problem, and when you're happy with the normal greedy wrapping (see the documentation of wrap_optimal_fit for an example of a different wrapping algorithm), then it's easy to write the code yourself. I made a quick-and-dirty implementation here just so that I can estimate the size overhead of using Textwrap: binary-sizes/main.rs.

By parameters of the problem, I mean things like:

  • Can the width become less than the width of the shortest word in your paragraph? If so, do you want to break words apart or let them stick out into the margin?
  • Should you support wrapping at hyphens ('-')? What about --foo-bar, where are the legal breakpoints in that word?
  • Wrapping at soft-hypens ('\u{00AD}')? This is not supported by Textwrap, but I hope to add it one day.
  • Should emojis be handled? Textwrap can either use unicode-width for support for all of Unicode, or it can use it's own trivial estimation which works for emojis, but which fails for Asian characters.
  • Should the available breakpoints be all ' ' characters only, or do you want to use the unicode-linebreak algorithm? How do you handle multiple spaces between words?

If you fix answers to some of these questions, the problem space shrinks dramatically and you end up with less code. The Textwrap dependencies are all optional, so you can slim it down as needed.

(This is in no way a knock against textwrap, btw. Being targeted in your use cases rather than trying to be everything to everyone often makes for better, not worse, libraries.)

Thanks, I completely get it!

Textwrap tries to be pretty configurable. It started out as a ~20 line crate which implemented the simplest and most naive wrapping you can imagine. I later added options for more and more cases.

Most recently, I made Textwrap handle proportional fonts, which you can see an example of here: https://mgeisler.github.io/textwrap. This uses JavaScript to measure the sizes of each word, but uses Textwrap to wrap the words into lines. So instead of working on a &str, Textwrap works on what I call "fragments": opaque boxes which have a width followed by whitespace. The internals operate on these fragments, and then there is a layer around that which operate on text. However, the Fragment trait is exposed on purpose to allow other programs to use it directly.

To summarize, if you want to let users transform text into wrapped lines, then Textwrap ought to be useful for that. Examples could be plain text and comments with or without indentation. Textwrap will not work for wrapping code according to an AST and you would need to built on top of the Fragment trait if you want to wrap something more than a plain &str (such as styled text).

mgeisler avatar Jun 05 '22 10:06 mgeisler

Hi @mgeisler,

Thanks for the essay! Ha ha. It's genuinely appreciated. :-) I've kind of ended up with an essay of my own below.

To answer your question about the parameters:

  • We want to use the simple greedy algorithm, not Knuth or similar. This isn't for performance (btw, kudos on your linear-time implementation!), but rather for UX: globally optimal solutions can cause the editing cursor to jump around unpredictably, because edits later in the paragraph can cause earlier parts to get re-wrapped differently. Knuth wrapping is great for final display of text, but not so much for editing.
  • Yes, we want to handle wrapping of words/segments that are longer than the wrapping line width.
  • Hyphens etc. aren't really an appropriate model for code. Inserting hyphenation is obviously a non-starter for this use case. And in terms of breaking on hyphens, there can be all kinds of punctuation/special characters in code, with varying meaning between programming languages. So it might be appropriate to break on hyphens in one language, but not another. For a first implementation, we'll probably punt on this and only break on white space. But we might get fancier in the future, using knowledge of language syntax.
  • Ideally, all unicode character widths should be handled appropriately for a monospace context. E.g. CJK will generally be double-width, etc. And, of course, grapheme clusters need to be handled correctly.
  • I imagine we'll handle multiple whitespace characters between words similarly to how I've implemented it before: you treat whitespace as being joined to the word that precedes it. This prevents wrapped whitespace from being at the start of lines whenever possible, and gives a clear single break point between words, which simplifies the code.

These are all things I've implemented before in a different editor project, and as long as we handle graphemes appropriately (which is already in Helix), none of the above points are IMO the hard parts of soft wrapping in an editor.

The actual hard parts come from a different set of parameters:

  • Since this is a code editor, we'll want to (optionally) preserve the initial indentation of a line in the subsequent soft wrapped portions of the line. Similarly, we'll want soft-wrapped portions to (optionally) have additional indentation as well.
  • We need to be able to map both from text offset -> screen space position and from screen space position -> text offset. The latter in particular involves knowing what screen space positions are valid (e.g. the aforementioned soft-wrapped indentation isn't real text, and therefore are not valid positions), and what "closest valid position" should mean, the tab stop width, etc. How we do this influences not just display but also editor behavior, so I suspect we'll want tight control over how this works.

Additionally, soft wrapping should be togglable, and we'll ideally want the code that handles things like character width, tab stops, text offset <-> screen space queries, etc. to be shared between wrapping and non-wrapping mode where reasonable to do so, to make it easier to keep behavior consistent. And that starts to feel a little out of place in an external text wrapping library, I think...?

I'm sure additional features could be added to textwrap to accommodate these requirements. But at a certain point, it starts to feel like we're pushing code that really belongs in Helix into textwrap just to accommodate our usage of it. And I guess, ultimately, my gut is just telling me that we're probably going to want tighter integration for soft wrapping than we're likely to get with an external library. I could be wrong, of course. But that's where I'm at, at least.

Having said all of that, aside from my maintenance of Ropey, I'm not currently an active contributor to Helix. So I guess no one should take my opinion here with too much weight, ha ha. But I am an invested user, who cares a lot about this particular feature.

cessen avatar Jun 06 '22 23:06 cessen

  • We want to use the simple greedy algorithm, not Knuth or similar. This isn't for performance (btw, kudos on your linear-time implementation!), but rather for UX: globally optimal solutions can cause the editing cursor to jump around unpredictably,

Yeah, definitely. About the linear-time algorithm, I was as surprised as everyone else to learn that it was possible :smile: I found some Python code which I ported to Rust and it seems to work.

The actual hard parts come from a different set of parameters:

  • Since this is a code editor, we'll want to (optionally) preserve the initial indentation of a line in the subsequent soft wrapped portions of the line. Similarly, we'll want soft-wrapped portions to (optionally) have additional indentation as well.

This sounds like something that is outside of what Textwrap should do. Put differently, deciding on the amount of indentation to use is something I would expect the caller of Textwrap to do. So if you find that you need 12 space indentation for the first line and 16 spaces for the subsequent lines, then you can send the text to Textwrap and have it wrap with those prefixes.

In any case, I'll be happy to answer questions about what Textwrap can and cannot do — it's a very simple system at heart (as one would expect) and then it has a few layers on top to make it more flexible.

I ended up having a parallel discussion with @getreu in #2419, I hope you can all align on a good way to use Textwrap (or not) for the different parts of the editor.

mgeisler avatar Jun 11 '22 12:06 mgeisler

@mgeisler Hey Martin, I was just wondering where work on this feature’s at currently. Would be amazing to have :)

aral avatar Sep 23 '22 09:09 aral

Status?

spiderman-idog avatar Nov 14 '22 21:11 spiderman-idog

See https://github.com/helix-editor/helix/pull/417#issuecomment-1303910195

kirawi avatar Nov 14 '22 22:11 kirawi

See #417 (comment)

Thanks!

spiderman-idog avatar Nov 15 '22 00:11 spiderman-idog

Hey @aral, it seems another plan has been made. I'm not directly involved with Helix development, but I'll be happy to adapt Textwrap to make it flexible enough for this use case.

mgeisler avatar Nov 18 '22 19:11 mgeisler

The modifications necessary to support text wrapping and virtual text are too specific to Helix, such as caching breaks. It's not a fault of textwrap.

kirawi avatar Nov 18 '22 21:11 kirawi

Yeah, there are definitely many other factors at play here!

In particular, you would probably end up re-implementing large parts, just like I do in my Wasm demo (see https://github.com/mgeisler/textwrap/blob/master/examples/wasm/src/lib.rs). You'll be using the normal first-fit wrapping algorithm (since optimal-fit wrapping behaves funny when you use it with interactive text, see cargo run --example interactive in a Textwrap checkout) and so you can end up with simpler code by just inlining things.

Now, if you do decide to add a hard-wrap option which inserts actual \n characters in the file, then the optimal-fit wrapping could be really pretty to have. I've been using Emacs for 20 years, and I habitually press M-q (Alt-q) all the time to hard-wrap my text and comments in all sorts of files. I really ought to make that shortcut use Textwrap with the optimal-fit wrapping to see how that would look :smile:

mgeisler avatar Nov 20 '22 22:11 mgeisler

I appreciate all the people working on this. I wouldn't mind a character based unindented soft wrap (similar to what kakoune does) as a starting point.

I love helix but I have to use kakoune to edit my LaTeX files right now becuase helix doesn't soft wrap. It would be nice to have a basic toggleable soft wrap that could be replaced by an improved version in the future. I'd prefer many of the improvements suggested here, but I wouldn't mind something simple at first if it's a lot faster to release.

kpa28-git avatar Nov 30 '22 01:11 kpa28-git

The rendering potion of text wrapping is implemented in #5008 (including proper handling of indentation and linear splitting at word boundaries, falling back to traditional softwrap when that is not possible). This PR only gets us part of the way there as the rest of the editor still needs to be adjusted to account for the fact a single line might take up multiple lines on screen but it does contain a big portion of the work

pascalkuthe avatar Dec 05 '22 01:12 pascalkuthe

to edit my LaTeX files right now becuase helix doesn't soft wrap

oh lol, i was thinking same, am guilty of not hard breaking my para in LaTeX myself - probably as it made a bit harder and unclean to work with that....

But

  • seeing all the issues mentioned here regarding navigations etc etc...
  • combined with my internal will to keep line lengths in check

I have decided to:

  • not run after this soft wrap,
  • rather, opting for the "live" hard wrap[^lhw] or "automatic reflow" dynamically/in realtime i.e. while typing.

[^lhw]: @ vlmutolo at https://github.com/helix-editor/helix/issues/136#issuecomment-1109218991

In my own words:

  • what i meant by above "dynamic reflow" is that
  • how about automatically breaking and conjoining lines adhering to some specified character limit per line? (like say 73)
  • fantasizingly: this can be made non-constant to get some pretty dashing ASCII art flowing inside some particular shape like in inkscape. wow.

goyalyashpal avatar Jan 21 '23 21:01 goyalyashpal

Softwrap is already implemented in #5420 and works quite well. I encourage you to try it out. Continuous hardwrap can be implemented based on the work I already did in that PR once it lands.

pascalkuthe avatar Jan 21 '23 22:01 pascalkuthe