Add more split functions to List, Seq, and other collection modules
Submitted by Calogyne on 4/23/2016 12:00:00 AM
3 votes on UserVoice prior to migration
Haskell has this Data.List.Split (https://hackage.haskell.org/package/split-0.1.1/docs/Data-List-Split.html) library which supports multiple strategies for list splitting. I think those functions can come in handy in some scenarios, plus they cannot be easily made by chaining other existing functions, therefore should be included in the core library?
Original UserVoice Submission Archived Uservoice Comments
More information about the proposed functions is needed
Closing as no new information has been provided
Now that this suggestion was rejected I want to let you know that I did once an initial port of that library, so if someone is interested in taking it to something serious, please go ahead.
I personally took some inspiration from there and add some split functions to F#+ and at the moment I feel very comfortable with those functions since they have a unified signature across collections and of course, there is also a generic version that uses overload to take the right implementation.
@gmpl If you would like to spec out a set of additions as a reply here, I can re-open
@dsyme I can share the spec I did for F#+. Since there the end goal is to have a generic version that works on any collection (Seq, Array, List, String) of each function, it requires consistent (non-generic) versions across all collections, which could be a good addition to the F# core lib.
So here's a sample file from split and intercalate.
By looking at both the existing .NET functions and the Haskell library I decided to provide a mechanism to:
- Split on one or many consecutive elements
- Specify more than one possible separator
- Rebuild the original collection with an inverse (Join-like) function
and decided not to provide a way to:
- specify additional options, like RemoveBlanks (this could be done later with a filter)
- provide other splitting strategies, not based on separators (F# has some functions that provide other splitting strategies, ie:
splitAt)
So the split functions will have this signature seq<[Collection]> -> [Collection] -> seq<[Collection]> where Collections could be String, Seq, List, Array.
As an example:
List.split [ [-1; 0; -1]; [10] ] [3; 5; 10; 7; -1; 0; -1; 9; 0; 10; 4]
//seq [ [3; 5]; [7]; [9; 0; [4] ]
To 'join' the collection we can obviously specify only one separator:
List.intercalate [-1; 0; -1] (seq [ [3; 5]; [7]; [9; 0; [4] ])
// [3; 5; 10; 7; -1; 0; -1; 9; 0; -1; 0; -1; 4]
Note that String.intercalate sep str is the curried version of String.Join(sep, str)
As an alternative design, the split function could be named splitOnAny and have another version which accepts only one separator which could be called splitOn so splitOn sep is the inverse of intercalate sep (except for cases where separator is already in the collection).
I have also included some related functions, like intersperse (insert a sequence of elements after each element) and replace (replace a sequence of elements with another sequence of elements) here are the signatures:
-
intersperse
:'T -> [Collection<'T>] ->[Collection<'T>]for strings it'schar -> string -> string -
replace
[Collection] -> [Collection] ->[Collection] -> [Collection]
Also note that this spec provides some interesting String implementations for some functions listed in this popular suggestion particularly the most discussed String.split.
I'm fine if this suggestion is not accepted because I use F#+ in my projects, so I have all this functionality ready available (plus the generic versions).
But I'm also happy to share the spec of the stuff I did there, in case you think it could be a nice addition to the language, for me it simplified my life by not having to remember each overload of the old .NET split function which are inconsistent and by reusing the same logic for collections.
@gusty OK, thanks! That's very helpful
If you have 10 minutes please jot down the full signature of the set of things you think might reasonably be added to FSharp.Core, with a sample call for each?
@dsyme here's the set of functions to be added:
// split the source on any of the separators specified
split separators source
// intercalate the separator between the elements
intercalate separator source
// intersperse the element between the source elements
instersperse element source
// replace a sequence of elements
replace oldValue newValue source
Here are the signatures:
Seq module
Seq.split : seq<#seq<'T>> -> seq<'T> -> seq<seq<'T>>
Seq.intercalate : seq<'T> -> seq<seq<'T>> -> seq<'T>
Seq.intersperse :'T -> seq<'T> -> seq<'T>
Seq.replace : seq<'T> -> seq<'T> -> seq<'T> -> seq<'T>
List module
List.split : seq<List<'T>> -> List<'T> -> seq<List<'T>>
List.intercalate : List<'T> -> seq<List<'T> -> List<'T>
List.intersperse : 'T -> List<'T> -> List<'T>
List.replace : List<'T> -> List<'T> -> List<'T> -> List<'T>
Array module
Array.split : seq<'T []> -> 'T [] -> seq<'T []>
Array.intercalate : 'T [] -> seq<'T []> -> 'T []
Array.intersperse : 'T -> 'T [] -> 'T []
Array.replace : 'T [] -> 'T [] -> 'T [] -> 'T []
String module
String.split : seq<string> -> string -> seq<string>
String.intercalate : string -> seq<string> -> string
String.intersperse : char -> string -> string
String.replace : string -> string -> string -> string
Sample code:
List.split (seq [ [-1; 0; -1]; [10] ]) [3; 5; 10; 7; -1; 0; -1; 9; 0; 10; 4]
// seq [ [3; 5]; [7]; [9; 0; [4] ]
String.split [" "; ", "] "This is a sample text, showing how to split and join collections"
// seq ["This"; "is"; "a"; "sample"; "text"; "showing"; "how"; "to"; "split"; "and"; "join"; "collections"]
Array.intercalate [|0;1|] (seq [[|2;3;4|]; [|20;30;40|]; [|200;300;400|]])
// [|2; 3; 4; 0; 1; 20; 30; 40; 0; 1; 200; 300; 400|]
String.intercalate " <-> " (seq ["first";"second";"third"])
// "first <-> second <-> third"
List.intersperse 0 [1..4]
// [1; 0; 2; 0; 3; 0; 4]
String.intersperse ',' "abcd"
// "a,b,c,d"
I noticed an RFC was created for some of these functions in the string module, particulary split, but I think they're going more or less in the same direction.
I like the idea of providing a unified toolbox of functions across ordered collections and string (treated as a seq). Such unification is a great boon to anyone learning F#, and reduces the API surface area.
There are some legacy issues. for example concat has inconsistent definition across List. Array and String. It is unexpected in String because it includes a separator and therefore is intercalate. Everywhere else it is the expected concatenation function. Learning F# this inconsistency got me many times, making it difficult for me to remember the precise definition of concat.
Personally I'd like to relabel String.concat (with separator) as String.join and deprecate String.concat with a view to eventually having a consistent API, but this might be a step too far.
I don't mind intercalate for join. Join marries well with split, and for many is expected. Intercalate is coherent with intersperse.
I do think that replace here on collections needs some thought. It would be more consistent with other collection functions, and also more general, to have signature:
let List.replace: pred: 'a -> bool -> replacement: 'a -> src: 'a list -> 'a list
Or you might argue that a mapPart function, even more general, would be more useful:
let List.mapPart: pred: ('a -> bool) -> mapping: ('a -> 'b) -> src: 'a list -> 'b list
All of these functions, including replace, can be implemented in other ways so what is helpful to add should I guess be a combination of usage and obviousness. What I mean by this is that a function that is expected from name and the rest of the API is less burdensome and more likely to be used than one with a non-obvious name and/or which is not coherent with the rest of the API.
The same consideration applies to the other functions here.
@tomcl I'm very aware of the concat inconsistency, but I don't think join is a good alternative and will tell you why:
-
From most Category Theory abstractions
concatcomes fromMonoid(or abstractions like Alternative, MonadPlus which are also monoids) andjoinfromMonadbut for list-like types the functionality of both functions is the same, they collapse a list of lists into a single list. -
Aside from the abstractions, looking at their names it's not evident that in the process of collapsing, they will intercalate a separator.
I'm happy to defer to you over nomenclature, because I find it difficult to reconcile the expectations of different audiences.
F# audience is not all (maybe even not primarily) those familiar with Haskell, and the abstraction nomenclature there. But it is true that coherence with that world would be desirable.
My primary principle would internal coherence within F#, and use of names there which is as descriptive as possible.
@tomcl I would also prefer coherence with existing F# names over coherence with Haskell or another non-dotnet language.
The problem in this particular case is that I haven't found such coherence in existing F# names, for the reasons above explained. So I picked intercalate but not just because I found it in a Haskell library, rather because the name describes more precisely what it does.
@tomcl Regarding the replace function, I did apply the same criteria as the split functions I mentioned in my first message, I mean:
not provide other splitting strategies, not based on separators
That was my decision, because otherwise it will become a dedicated library, like the Haskell lib mentioned there.
Apart from that I think it covers most uses cases and you can always use the standard .NET functions to do something more complex, including regex.
Reopening as information has been provided
Re names - For these I would expect more explicit names like
Array.splitAtAny : seq<'T> -> 'T [] -> seq<'T []>
Array.splitAtAnySubsequence : seq<'T []> -> 'T [] -> seq<'T []>
Array.intersperseMany : 'T [] -> seq<'T []> -> 'T []
Array.intersperseOne : 'T -> 'T [] -> 'T []
Array.replaceAtIndex : int -> 'T -> 'T [] -> 'T [] // may as well add this, we should always have had it
Array.replaceOne : 'T -> 'T -> 'T [] -> 'T []
Array.replaceSubsequence : 'T [] -> 'T [] -> 'T [] -> 'T []
Not saying these are definitive names, just giving an indication
I agree with having more explicit names, but splitAtAnySubsequence is a bit too long.
Also note that splitAt exisits already in Core and it means at an index.
What about splitOnAnyElement instead of splitAtAny and splitOnAny instead of splitAtAnySubsequence. I mean the more generic name for the more generic function.
Then we can also include, for completeness:
Array.splitOnElement : 'T -> 'T [] -> seq<'T []>
Array.splitOn : 'T [] -> 'T [] -> seq<'T []>
Perhaps this:
Array.splitAt : int -> 'T[] -> 'T[] * 'T[]
Array.splitOn : 'T -> 'T [] -> 'T [] []
Array.splitOnAny : seq<'T> -> 'T [] -> 'T [] []
Array.splitOnAnySegment : seq<'T []> -> 'T [] -> 'T [] []
Array.insertAt : int -> 'T -> 'T [] -> 'T []
Array.insertSegmentAt : int -> 'T[] -> 'T [] -> 'T []
Array.intersperse : 'T -> 'T [] -> 'T []
Array.intersperseSegment : 'T [] -> seq<'T []> -> 'T []
Array.replace : 'T -> 'T -> 'T [] -> 'T [] // Q. replace all or replace first?
Array.replaceAt : int -> 'T -> 'T [] -> 'T [] // may as well add this, we should always have had it
Array.replaceSegment : 'T [] -> 'T [] -> 'T [] -> 'T []
Array.remove : 'T -> 'T [] -> 'T [] // Q. remove all or remove first?
Array.removeAt : int -> 'T [] -> 'T []
Array.removeSegment : 'T [] -> 'T [] -> 'T []
Though Segment is not current terminology
Can we use Slice in-lieu of Segment ?
I'd prefer Slice instead of Segment for the name
Some of this was done in F# 6, these remain:
Array.splitOn : 'T -> 'T [] -> 'T [] []
Array.splitOnAny : seq<'T> -> 'T [] -> 'T [] []
Array.splitOnAnyChunk : seq<'T []> -> 'T [] -> 'T [] []
Array.intersperse : 'T -> 'T [] -> 'T []
Array.intersperseMany : 'T [] -> seq<'T []> -> 'T []
Closing this as the remaining work isn't particularly significant and can easily be added as helpers