textwrap Add `tab_width` support

This merge request is an updated version of #429. It allows the width of a tab to be configured for a variety of this crate's functions, instead of the width being fixed at zero, as it currently is.

Nov 02 '22 02:11 mtoohey31

The fuzz tests need some small fixes, something like this:

modified   fuzz/fuzz_targets/wrap_first_fit.rs
@@ -14,12 +14,12 @@ struct Word {
 #[rustfmt::skip]
 impl core::Fragment for Word {
     fn width(&self) -> f64 { self.width }
-    fn whitespace_width(&self) -> f64 { self.whitespace_width }
+    fn whitespace_width(&self, _: u8) -> f64 { self.whitespace_width }
     fn penalty_width(&self) -> f64 { self.penalty_width }
 }
 
 fuzz_target!(|input: (f64, Vec<Word>)| {
     let width = input.0;
     let words = input.1;
-    let _ = wrap_first_fit(&words, &[width]);
+    let _ = wrap_first_fit(&words, &[width], 0);
 });
modified   fuzz/fuzz_targets/wrap_optimal_fit.rs
@@ -35,7 +35,7 @@ struct Word {
 #[rustfmt::skip]
 impl core::Fragment for Word {
     fn width(&self) -> f64 { self.width }
-    fn whitespace_width(&self) -> f64 { self.whitespace_width }
+    fn whitespace_width(&self, _: u8) -> f64 { self.whitespace_width }
     fn penalty_width(&self) -> f64 { self.penalty_width }
 }
 
@@ -57,5 +57,5 @@ fuzz_target!(|input: (usize, Vec<Word>, Penalties)| {
         }
     }
 
-    let _ = wrap_optimal_fit(&words, &[width as f64], &penalties);
+    let _ = wrap_optimal_fit(&words, &[width as f64], 0, &penalties);
 });
modified   fuzz/fuzz_targets/wrap_optimal_fit_usize.rs
@@ -35,7 +35,7 @@ struct Word {
 #[rustfmt::skip]
 impl core::Fragment for Word {
     fn width(&self) -> f64 { self.width as f64 }
-    fn whitespace_width(&self) -> f64 { self.whitespace_width as f64 }
+    fn whitespace_width(&self, _: u8) -> f64 { self.whitespace_width as f64 }
     fn penalty_width(&self) -> f64 { self.penalty_width as f64 }
 }
 
@@ -45,5 +45,5 @@ fuzz_target!(|input: (usize, Vec<Word>, Penalties)| {
     let width = input.0;
     let words = input.1;
     let penalties = input.2.into();
-    let _ = wrap_optimal_fit(&words, &[width as f64], &penalties);
+    let _ = wrap_optimal_fit(&words, &[width as f64], 0, &penalties);
 });

With that, a problem will appear: running cargo fuzz run wrap_fast_path fails on input like

    let input = (String::from("foo \t"), 20);

You'll get

thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `["foo \t"]`,
 right: `["foo "]`', fuzz_targets/wrap_fast_path.rs:20:5

The problem is that we end up with two words instead of just one:

[src/textwrap/src/word_separators.rs:293] &opportunities = IntoIter(
    [
        (
            4,
            Allowed,
        ),
        (
            5,
            Mandatory,
        ),
    ],
)
[src/wrap.rs:231] &words = [
    Word {
        word: "foo",
        whitespace: " ",
        penalty: "",
        width: 3,
    },
    Word {
        word: "",
        whitespace: "\t",
        penalty: "",
        width: 0,
    },
]

The fast-path optimization assumes that every word matters, but here the second word should actually have been merged with the first word (with whitespace: " \t",). So when the slow path runs, it will add "foo" (word), then " " (whitespace) and finally "" (word) but no whitespace from the second word since it's the final word of the line.

I think we might be able to fix this if we run trim_end_matches(&[' ', '\t']) before finding words. We don't output the trailing whitespace any longer, so it should be correct to do that. This seems to work from some light testing:

modified   src/core.rs
@@ -243,7 +243,7 @@ impl<'a> Word<'a> {
     /// All trailing whitespace is automatically taken to be the whitespace part
     /// of the word.
     pub fn from(word: &str) -> Word<'_> {
-        let trimmed = word.trim_end();
+        let trimmed = word.trim_end_matches(&[' ', '\t']);
         Word {
             word: trimmed,
             // trimmed shouldn't contain whitespace, so we don't need to pass
modified   src/wrap.rs
@@ -203,7 +203,7 @@ pub(crate) fn wrap_single_line<'a>(
         options.subsequent_indent
     };
     if line.len() < options.width && options.tab_width <= 1 && indent.is_empty() {
-        lines.push(Cow::from(line.trim_end_matches(' ')));
+        lines.push(Cow::from(line.trim_end_matches(&[' ', '\t'])));
     } else {
         wrap_single_line_slow_path(line, options, lines)
     }
@@ -217,6 +217,8 @@ pub(crate) fn wrap_single_line_slow_path<'a>(
     options: &Options<'_>,
     lines: &mut Vec<Cow<'a, str>>,
 ) {
+    let line = line.trim_end_matches(&[' ', '\t']);
+
     let initial_width = options
         .width
         .saturating_sub(display_width(options.initial_indent, options.tab_width));

Nov 20 '22 21:11 mgeisler

Fuzz tests should :crossed_fingers: be all fixed in 8b2a9f18089f01e8d6b329581ebce8c9d2e32129, and the wasm demo should be fixed in f1352c88fdc6986eb0cbb8675a41f14502987102. I'm just ignoring the tab_width parameter in the wasm demo, because web_sys::CanvasRenderingContext2d::measure_text should already register tab widths correctly, right?

Nov 29 '22 03:11 mtoohey31

Hey @mtoohey31! Happy new year... :smile:

Can I get you to squash your fixes into the commits which introduced the problems in the first place? That will make the commits here easier to review. It would also be great if you can rebase on top of the latest master so we can get fresh test run.

Feb 09 '23 21:02 mgeisler

Hey @mtoohey31! Happy new year... :smile:

Haha, thanks :slightly_smiling_face: Same to you!

Can I get you to squash your fixes into the commits which introduced the problems in the first place? That will make the commits here easier to review. It would also be great if you can rebase on top of the latest master so we can get fresh test run.

Sure, should be rebased now! Pretty much all of the extra commits were fixes for problems, so I've squashed everything into one commit. Hope that's ok; if there was another way you'd like me to split things up though, let me know.

Feb 16 '23 04:02 mtoohey31

Pretty much all of the extra commits were fixes for problems, so I've squashed everything into one commit. Hope that's ok; if there was another way you'd like me to split things up though, let me know.

That is precisely what I was hoping for — fixes to problems found in PR discussions are not useful for the long-term history, so they should ideally be squashed away before merging. Thank you!

Mar 26 '23 21:03 mgeisler

Hi! What's the status of this? I'd be interested in helping to get it over the finish line if there's any more work required (I'm keen to get https://github.com/helix-editor/helix/issues/3622 fixed).

Also thanks for the excellent crate. :)

Jul 28 '23 15:07 callum-oakley

@mgeisler I see there are some conflicts now, I would still like to get this fixed at some point though. If I rebase the changes, would you possibly have time to look at the merge request again? Or is there an issue with the general approach that you'd like to resolve before we move forward? I don't mind starting the PR from scratch with a new approach if that's what has to happen.

Mar 19 '24 03:03 mtoohey31

textwrap textwrap copied to clipboard

Add `tab_width` support

textwrap
textwrap copied to clipboard