textwrap Improvements to `unfill` and `refill` from the runwrap crate

In #224, @veikman mentioned his new runwrap library and it has this comment:

/// Preserve initial indentation on unwrapping.
/// This is a workaround for textwrap’s tendency to interpret non-alphanumeric leading characters
/// as indentation (e.g. comment syntax) and destroy it. What textwrap calls “subsequent_indent” is
/// destroyed without comment.

I would love to hear more and discuss how we can improve this.

A few weeks ago, I downloaded all public crates which depend on Textwrap and none of them use unfill or refill. In other words, these functions are pretty new and unproven. I'm sure we can make them better!

Jun 06 '21 20:06 mgeisler

I found the behaviour in question when I ran textwrap on an ordinary Markdown heading. There is a unit test case for this here. In that particular use case (Markdown), I don’t think of leading hash characters as indentation, but I suppose that could be a matter of opinion.

I imagine it would be possible to expand the Options struct with limitations on what can be considered indentation for the purpose of unfilling, but I don’t yet know enough about the problem to specify how, and I’m not sure it’s important. The workaround does what it should for now.

Jun 07 '21 09:06 veikman

Thanks for the explanation! So the problem is that unfill sees

# A heading

as "A heading" with "# " as initial_indent. That does indeed seem rather silly :smile:

I guess it would work much better if we restrict the heuristic a bit:

only look for initial and subsequent indentation in multi-line strings. That should prevent a lot of misinterpretations.
perhaps only set initial_indent if it is equal to what we detect as subsequent_indent?

Jun 08 '21 22:06 mgeisler