rx-ranges icon indicating copy to clipboard operation
rx-ranges copied to clipboard

RFE: string extension - split, strip, join

Open Xecutor opened this issue 5 years ago • 4 comments

Hello.

I have some preliminary implementation split range, strip transform and join sink. But I'm not sure if they should be part of main lib or should be placed in a separate file as an optional extension.

strip takes two const iterators of input, two const iterators of separator and produces basic_string_view<CharT> as output. strip takes range of const CharT*, basic_string_view<CharT> or basic_string<T> as input and produces basic_string_view<CharT> as output. join takes range of const CharT*, basic_string_view<CharT> or basic_string<T> and const CharT*, basic_string_view<CharT> or basic_string<T> as input and produces basic_string<CharT> as output.

Need to discuss this and I'll make corresponding PR.

Xecutor avatar Dec 01 '19 14:12 Xecutor

It seems like join is already present as chain?

I would suggest expressing split (or split_by?) as taking an input range instead of iterator pairs. Since strings can be implicitly converted to ranges, this would work:

auto strings = "a,b"sv | split_by(",") | to_vector(); // {"a"sv, "b"sv}

simonask avatar Dec 02 '19 07:12 simonask

I wasn't clear enough: join with separator, like inverse split : vector{{"aa","bb","cc"}} | join(":") yields std::string aa:bb:cc.

Arbitrary input range for split_by seems somewhat problematic. std::search takes two iterators of the same type as input.

Xecutor avatar Dec 02 '19 15:12 Xecutor

To be consistent with the rest of the library, join should return a range outputting an element every other index, a sink can then be used to create a string from it. Same for split, it should take a range and return a range of range like in_groups_of.

strip, on the other hand, seems hard to implement (given its definition in the python std lib). The end trim is not possible without a lookahead, which means an allocation to store the elements.

Hazurl avatar Dec 02 '19 20:12 Hazurl

If split_by should produce range of string_views, then these string_views should be constructed from pointer and size. I tried to do this with range as an input to split_by, but it looks quite cumbersome, and range based iterators being strictly forward iterators kind of incompatible with ability to takes distance between iterators and the memory not necessary will be continuous. The original idea was that split(" hello , world , foo , bar", ",') | string() | to_vector(); will produce vector of string_views to original string. Probably I'm missing something...

Xecutor avatar Dec 05 '19 16:12 Xecutor