dotnet icon indicating copy to clipboard operation
dotnet copied to clipboard

Support tokenize span using more than 1 separator

Open skarllot opened this issue 2 years ago • 3 comments

Overview

Sometimes a [ReadOnly]Span needs to be tokenized using more than one separator.

API breakdown

namespace CommunityToolkit.HighPerformance;

public static class SpanExtensions
{
    public static SpanTokenizer2<T> Tokenize<T>(this Span<T> span, T separator0, T separator1);
    public static SpanTokenizer3<T> Tokenize<T>(this Span<T> span, T separator0, T separator1, T separator2);
    public static SpanTokenizerAny<T> Tokenize<T>(this Span<T> span, ReadOnlySpan<T> separators)
}
namespace CommunityToolkit.HighPerformance;

public static class ReadOnlySpanExtensions
{
    public static ReadOnlySpanTokenizer2<T> Tokenize<T>(this ReadOnlySpan<T> span, T separator0, T separator1);
    public static ReadOnlySpanTokenizer3<T> Tokenize<T>(this ReadOnlySpan<T> span, T separator0, T separator1, T separator2);
    public static ReadOnlySpanTokenizerAny<T> Tokenize<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> separators)
}

Usage example

ReadOnlySpan<char> content = "John; 1960, USA";

foreach (var token in content.Tokenize(';', ',')
{
      Console.WriteLine(token.ToString());
} 

Breaking change?

No

Alternatives

Not that I'm aware of.

Additional context

The new tokenizer structs can use IndexOfAny instead of IndexOf.

Help us help you

Yes, I'd like to be assigned to work on this item

skarllot avatar Dec 02 '23 17:12 skarllot

Can this feature be implemented?

skarllot avatar Apr 07 '24 13:04 skarllot