unicode-linebreak
unicode-linebreak copied to clipboard
Created a `linebreaks_iter()` variant of `linebreaks`
linebreaks has a very simple interface - it takes a &str and returns an iterator returning break opportunities tagged by positions. This change introduces a linebreaks_iter() variant that instead of taking a &str:
- Takes an iterator passing in
charand arbitrary indexes that do not necessarily have to beusize - A
final_idxparameter containing the final index.
The original linebreaks() function can now easily be implemented in terms of linebreaks_iter():
pub fn linebreaks(s: &str) -> impl Iterator<Item = (usize, BreakOpportunity)> + Clone + '_ {
linebreaks_iter(s.char_indices(), s.len())
}
This allows the core algorithm to be decoupled from the representation of the string, allowing text to be passed in alternative formats (e.g. - UTF-16 or UTF-8) or possibly a complex data structure where everything is not conveniently a singular string.
I think that it is unfortunate that the iter parameter passed to linebreaks_iter() needs to implement Clone. This is only because the return iterator needs to implement Clone to support the normal linebreaks() call.
If there is a way in Rust to support implementing Clone on a conditional basis, I don't know it.
@axelf4 any thoughts on this PR?
I came here to make exactly this change. My use-case is similar to those mentioned in #1 and this PR would be great for my needs.