unicode-linebreak icon indicating copy to clipboard operation
unicode-linebreak copied to clipboard

Created a `linebreaks_iter()` variant of `linebreaks`

Open nwoods-cimpress opened this issue 1 year ago β€’ 2 comments

linebreaks has a very simple interface - it takes a &str and returns an iterator returning break opportunities tagged by positions. This change introduces a linebreaks_iter() variant that instead of taking a &str:

  • Takes an iterator passing in char and arbitrary indexes that do not necessarily have to be usize
  • A final_idx parameter containing the final index.

The original linebreaks() function can now easily be implemented in terms of linebreaks_iter():

pub fn linebreaks(s: &str) -> impl Iterator<Item = (usize, BreakOpportunity)> + Clone + '_ {
    linebreaks_iter(s.char_indices(), s.len())
}

This allows the core algorithm to be decoupled from the representation of the string, allowing text to be passed in alternative formats (e.g. - UTF-16 or UTF-8) or possibly a complex data structure where everything is not conveniently a singular string.

nwoods-cimpress avatar Oct 09 '24 18:10 nwoods-cimpress

I think that it is unfortunate that the iter parameter passed to linebreaks_iter() needs to implement Clone. This is only because the return iterator needs to implement Clone to support the normal linebreaks() call.

If there is a way in Rust to support implementing Clone on a conditional basis, I don't know it.

nwoods-cimpress avatar Nov 12 '24 17:11 nwoods-cimpress

@axelf4 any thoughts on this PR?

nwoods-cimpress avatar Feb 06 '25 12:02 nwoods-cimpress

I came here to make exactly this change. My use-case is similar to those mentioned in #1 and this PR would be great for my needs.

ccbrown avatar Apr 24 '25 02:04 ccbrown