fslang-suggestions icon indicating copy to clipboard operation
fslang-suggestions copied to clipboard

Add more split functions to List, Seq, and other collection modules

Open baronfel opened this issue 9 years ago • 18 comments

Submitted by Calogyne on 4/23/2016 12:00:00 AM
3 votes on UserVoice prior to migration

Haskell has this Data.List.Split (https://hackage.haskell.org/package/split-0.1.1/docs/Data-List-Split.html) library which supports multiple strategies for list splitting. I think those functions can come in handy in some scenarios, plus they cannot be easily made by chaining other existing functions, therefore should be included in the core library?

Original UserVoice Submission Archived Uservoice Comments

baronfel avatar Oct 20 '16 01:10 baronfel

More information about the proposed functions is needed

dsyme avatar Oct 29 '16 12:10 dsyme

Closing as no new information has been provided

dsyme avatar Mar 01 '17 14:03 dsyme

Now that this suggestion was rejected I want to let you know that I did once an initial port of that library, so if someone is interested in taking it to something serious, please go ahead.

I personally took some inspiration from there and add some split functions to F#+ and at the moment I feel very comfortable with those functions since they have a unified signature across collections and of course, there is also a generic version that uses overload to take the right implementation.

gusty avatar Mar 01 '17 16:03 gusty

@gmpl If you would like to spec out a set of additions as a reply here, I can re-open

dsyme avatar Mar 03 '17 20:03 dsyme

@dsyme I can share the spec I did for F#+. Since there the end goal is to have a generic version that works on any collection (Seq, Array, List, String) of each function, it requires consistent (non-generic) versions across all collections, which could be a good addition to the F# core lib.

So here's a sample file from split and intercalate.

By looking at both the existing .NET functions and the Haskell library I decided to provide a mechanism to:

  • Split on one or many consecutive elements
  • Specify more than one possible separator
  • Rebuild the original collection with an inverse (Join-like) function

and decided not to provide a way to:

  • specify additional options, like RemoveBlanks (this could be done later with a filter)
  • provide other splitting strategies, not based on separators (F# has some functions that provide other splitting strategies, ie: splitAt)

So the split functions will have this signature seq<[Collection]> -> [Collection] -> seq<[Collection]> where Collections could be String, Seq, List, Array.

As an example:

List.split [ [-1; 0; -1]; [10] ] [3; 5; 10; 7; -1; 0; -1; 9; 0; 10; 4]
//seq [ [3; 5]; [7]; [9; 0; [4] ]

To 'join' the collection we can obviously specify only one separator:

List.intercalate [-1; 0; -1] (seq [ [3; 5]; [7]; [9; 0; [4] ])
// [3; 5; 10; 7; -1; 0; -1; 9; 0; -1; 0; -1; 4]

Note that String.intercalate sep str is the curried version of String.Join(sep, str)

As an alternative design, the split function could be named splitOnAny and have another version which accepts only one separator which could be called splitOn so splitOn sep is the inverse of intercalate sep (except for cases where separator is already in the collection).

I have also included some related functions, like intersperse (insert a sequence of elements after each element) and replace (replace a sequence of elements with another sequence of elements) here are the signatures:

  • intersperse :'T -> [Collection<'T>] ->[Collection<'T>] for strings it's char -> string -> string

  • replace [Collection] -> [Collection] ->[Collection] -> [Collection]

Also note that this spec provides some interesting String implementations for some functions listed in this popular suggestion particularly the most discussed String.split.

I'm fine if this suggestion is not accepted because I use F#+ in my projects, so I have all this functionality ready available (plus the generic versions).

But I'm also happy to share the spec of the stuff I did there, in case you think it could be a nice addition to the language, for me it simplified my life by not having to remember each overload of the old .NET split function which are inconsistent and by reusing the same logic for collections.

gusty avatar Mar 14 '17 06:03 gusty

@gusty OK, thanks! That's very helpful

If you have 10 minutes please jot down the full signature of the set of things you think might reasonably be added to FSharp.Core, with a sample call for each?

dsyme avatar Mar 14 '17 15:03 dsyme

@dsyme here's the set of functions to be added:

// split the source on any of the separators specified
split separators source

// intercalate the separator between the elements
intercalate separator source

// intersperse the element between the source elements
instersperse element source

// replace a sequence of elements
replace oldValue newValue source

Here are the signatures:

Seq module

Seq.split : seq<#seq<'T>> -> seq<'T> -> seq<seq<'T>>

Seq.intercalate : seq<'T> -> seq<seq<'T>> -> seq<'T>

Seq.intersperse :'T -> seq<'T> -> seq<'T>

Seq.replace : seq<'T> -> seq<'T> -> seq<'T> -> seq<'T>

List module

List.split : seq<List<'T>> -> List<'T> -> seq<List<'T>>

List.intercalate : List<'T> -> seq<List<'T> -> List<'T>

List.intersperse : 'T -> List<'T> -> List<'T>

List.replace : List<'T> -> List<'T> -> List<'T> -> List<'T>

Array module

Array.split : seq<'T []> -> 'T [] -> seq<'T []>

Array.intercalate : 'T [] -> seq<'T []> -> 'T []

Array.intersperse : 'T -> 'T [] -> 'T []

Array.replace : 'T [] -> 'T [] -> 'T [] -> 'T []

String module

String.split : seq<string> -> string -> seq<string>

String.intercalate : string -> seq<string> -> string

String.intersperse : char -> string -> string

String.replace : string -> string -> string -> string

Sample code:

List.split (seq [ [-1; 0; -1]; [10] ]) [3; 5; 10; 7; -1; 0; -1; 9; 0; 10; 4]
// seq [ [3; 5]; [7]; [9; 0; [4] ]

String.split [" "; ", "] "This is a sample text, showing how to split and join collections"
// seq ["This"; "is"; "a"; "sample"; "text"; "showing"; "how"; "to"; "split"; "and"; "join"; "collections"]

Array.intercalate [|0;1|] (seq [[|2;3;4|]; [|20;30;40|]; [|200;300;400|]])
// [|2; 3; 4; 0; 1; 20; 30; 40; 0; 1; 200; 300; 400|]

String.intercalate " <-> " (seq ["first";"second";"third"])
// "first <-> second <-> third"

List.intersperse 0 [1..4]
// [1; 0; 2; 0; 3; 0; 4]

String.intersperse ',' "abcd"
// "a,b,c,d"

I noticed an RFC was created for some of these functions in the string module, particulary split, but I think they're going more or less in the same direction.

gusty avatar Jun 13 '17 21:06 gusty

I like the idea of providing a unified toolbox of functions across ordered collections and string (treated as a seq). Such unification is a great boon to anyone learning F#, and reduces the API surface area.

There are some legacy issues. for example concat has inconsistent definition across List. Array and String. It is unexpected in String because it includes a separator and therefore is intercalate. Everywhere else it is the expected concatenation function. Learning F# this inconsistency got me many times, making it difficult for me to remember the precise definition of concat.

Personally I'd like to relabel String.concat (with separator) as String.join and deprecate String.concat with a view to eventually having a consistent API, but this might be a step too far.

I don't mind intercalate for join. Join marries well with split, and for many is expected. Intercalate is coherent with intersperse.

I do think that replace here on collections needs some thought. It would be more consistent with other collection functions, and also more general, to have signature:

let List.replace: pred: 'a -> bool -> replacement: 'a -> src: 'a list -> 'a list

Or you might argue that a mapPart function, even more general, would be more useful:

let List.mapPart: pred: ('a -> bool) -> mapping: ('a -> 'b) -> src: 'a list -> 'b list

All of these functions, including replace, can be implemented in other ways so what is helpful to add should I guess be a combination of usage and obviousness. What I mean by this is that a function that is expected from name and the rest of the API is less burdensome and more likely to be used than one with a non-obvious name and/or which is not coherent with the rest of the API.

The same consideration applies to the other functions here.

tomcl avatar Jun 14 '17 11:06 tomcl

@tomcl I'm very aware of the concat inconsistency, but I don't think join is a good alternative and will tell you why:

  • From most Category Theory abstractions concat comes from Monoid (or abstractions like Alternative, MonadPlus which are also monoids) and join from Monad but for list-like types the functionality of both functions is the same, they collapse a list of lists into a single list.

  • Aside from the abstractions, looking at their names it's not evident that in the process of collapsing, they will intercalate a separator.

gusty avatar Jun 14 '17 12:06 gusty

I'm happy to defer to you over nomenclature, because I find it difficult to reconcile the expectations of different audiences.

F# audience is not all (maybe even not primarily) those familiar with Haskell, and the abstraction nomenclature there. But it is true that coherence with that world would be desirable.

My primary principle would internal coherence within F#, and use of names there which is as descriptive as possible.

tomcl avatar Jun 14 '17 15:06 tomcl

@tomcl I would also prefer coherence with existing F# names over coherence with Haskell or another non-dotnet language.

The problem in this particular case is that I haven't found such coherence in existing F# names, for the reasons above explained. So I picked intercalate but not just because I found it in a Haskell library, rather because the name describes more precisely what it does.

gusty avatar Jun 14 '17 15:06 gusty

@tomcl Regarding the replace function, I did apply the same criteria as the split functions I mentioned in my first message, I mean:

not provide other splitting strategies, not based on separators

That was my decision, because otherwise it will become a dedicated library, like the Haskell lib mentioned there.

Apart from that I think it covers most uses cases and you can always use the standard .NET functions to do something more complex, including regex.

gusty avatar Jun 14 '17 16:06 gusty

Reopening as information has been provided

dsyme avatar Jun 16 '17 14:06 dsyme

Re names - For these I would expect more explicit names like

Array.splitAtAny : seq<'T>  -> 'T [] -> seq<'T []>

Array.splitAtAnySubsequence : seq<'T []> -> 'T [] -> seq<'T []>

Array.intersperseMany : 'T [] -> seq<'T []> -> 'T []

Array.intersperseOne : 'T -> 'T [] -> 'T []

Array.replaceAtIndex : int -> 'T -> 'T [] -> 'T [] // may as well add this, we should always have had it

Array.replaceOne : 'T -> 'T -> 'T [] -> 'T []

Array.replaceSubsequence : 'T [] -> 'T [] -> 'T [] -> 'T []

Not saying these are definitive names, just giving an indication

dsyme avatar Jun 16 '17 15:06 dsyme

I agree with having more explicit names, but splitAtAnySubsequence is a bit too long.

Also note that splitAt exisits already in Core and it means at an index.

What about splitOnAnyElement instead of splitAtAny and splitOnAny instead of splitAtAnySubsequence. I mean the more generic name for the more generic function.

Then we can also include, for completeness:

Array.splitOnElement : 'T -> 'T [] -> seq<'T []>

Array.splitOn : 'T [] -> 'T [] -> seq<'T []>

gusty avatar Jun 17 '17 07:06 gusty

Perhaps this:

Array.splitAt : int -> 'T[] -> 'T[] * 'T[]

Array.splitOn : 'T  -> 'T [] -> 'T [] []

Array.splitOnAny : seq<'T>  -> 'T [] -> 'T [] []

Array.splitOnAnySegment : seq<'T []> -> 'T [] -> 'T [] []

Array.insertAt : int -> 'T -> 'T [] -> 'T []

Array.insertSegmentAt : int -> 'T[] -> 'T [] -> 'T []

Array.intersperse : 'T -> 'T [] -> 'T []

Array.intersperseSegment : 'T [] -> seq<'T []> -> 'T []

Array.replace : 'T -> 'T -> 'T [] -> 'T [] // Q. replace all or replace first?

Array.replaceAt : int -> 'T -> 'T [] -> 'T [] // may as well add this, we should always have had it

Array.replaceSegment : 'T [] -> 'T [] -> 'T [] -> 'T []

Array.remove : 'T -> 'T [] -> 'T [] // Q. remove all or remove first?

Array.removeAt : int -> 'T [] -> 'T []

Array.removeSegment : 'T [] -> 'T [] -> 'T []

Though Segment is not current terminology

dsyme avatar Jun 19 '17 10:06 dsyme

Can we use Slice in-lieu of Segment ?

gusty avatar Jul 19 '19 22:07 gusty

I'd prefer Slice instead of Segment for the name

cartermp avatar Jul 19 '19 22:07 cartermp

Some of this was done in F# 6, these remain:

Array.splitOn : 'T  -> 'T [] -> 'T [] []

Array.splitOnAny : seq<'T>  -> 'T [] -> 'T [] []

Array.splitOnAnyChunk : seq<'T []> -> 'T [] -> 'T [] []

Array.intersperse : 'T -> 'T [] -> 'T []

Array.intersperseMany : 'T [] -> seq<'T []> -> 'T []

dsyme avatar Oct 28 '22 13:10 dsyme

Closing this as the remaining work isn't particularly significant and can easily be added as helpers

dsyme avatar Jan 09 '23 21:01 dsyme