ICU4N icon indicating copy to clipboard operation
ICU4N copied to clipboard

Create extension methods for common BreakIterator operations

Open NightOwl888 opened this issue 4 years ago • 1 comments

While BreakIterator provides great low-level functionality for iterating forward and backward through breaks, it would be great if there were a simple way to do forward-only operations on string, StringBuilder, and char[].

IEnumerable<int> wordBreaks = theString.ToWordBreaks();
foreach (var break in wordBreaks)
{
    // consume
}

Or

IEnumerable<int> sentenceBreaks = theString.ToSentenceBreaks(new CultureInfo("th"));
foreach (var break in sentenceBreaks)
{
    // consume
}

We would ideally create a different extension method (with overloads for optional culture) for all 4 modes:

  1. Word
  2. Sentence
  3. Line
  4. Character

We could then expand on this to do a higher level operation, such as providing an IEnumerable<string> that would tokenize the text so it can be iterated with a foreach loop.

foreach (var word in theText.ToWords(new CultureInfo("th-th")))
{
   // consume each word
}

Some thought needs to be given to thread safety, since BreakIterator requires a separate clone for each thread.

NightOwl888 avatar Oct 12 '19 18:10 NightOwl888