ICU4N
ICU4N copied to clipboard
Create extension methods for common BreakIterator operations
While BreakIterator
provides great low-level functionality for iterating forward and backward through breaks, it would be great if there were a simple way to do forward-only operations on string
, StringBuilder
, and char[]
.
IEnumerable<int> wordBreaks = theString.ToWordBreaks();
foreach (var break in wordBreaks)
{
// consume
}
Or
IEnumerable<int> sentenceBreaks = theString.ToSentenceBreaks(new CultureInfo("th"));
foreach (var break in sentenceBreaks)
{
// consume
}
We would ideally create a different extension method (with overloads for optional culture) for all 4 modes:
- Word
- Sentence
- Line
- Character
We could then expand on this to do a higher level operation, such as providing an IEnumerable<string>
that would tokenize the text so it can be iterated with a foreach
loop.
foreach (var word in theText.ToWords(new CultureInfo("th-th")))
{
// consume each word
}
Some thought needs to be given to thread safety, since BreakIterator
requires a separate clone for each thread.