scala-library-next
scala-library-next copied to clipboard
Consider adding `split` operations
Scalaz’s Foldable has two variants of partitioning methods, named splitWith and selectSplit:
https://github.com/scalaz/scalaz/blob/series/7.3.x/core/src/main/scala/scalaz/Foldable.scala#L303-L351
Hey @markus1189, to finish this work you should rely on IsSeqLike instead (though it currently doesn’t support views). Are you interested in continuing this effort?
I am interested, will give it a try soon
FYI I’ve used splitWith as a test in this PR: https://github.com/scala/scala/pull/6674/files#diff-1ac5384a2a49471ae923396b6ee3f269R47
@julienrf Is it still supposed to be in collections-contrib and inside a SeqDecorator as it was originally? Seems like a lot of the stuff there was deleted. Where did you use splitWith in that pr? It's not there, or I am blind :wink:
Also, where can I find the IsSeqLike?
Oops, I’ve updated the PR yesterday and renamed it to groupedWith: https://github.com/scala/scala/pull/6674/files#diff-1ac5384a2a49471ae923396b6ee3f269R80
Here is the story about IsSeqLike. It’s part of the standard library but in collections-contrib we have something similar named HasSeqOps. The goal of the PR I linked is to unify them.
The idea I have in mind would be for you work to target the collections-contrib module. But you can also do a PR on scala/scala (without using IsSeqLike since you will be able to implement it directly in the collections, then) and start a discussion to have them in the standard library…
@julienrf Here's another naming quibble!
I think the names splitWith and selectSplit are terrible!
-
selectSplitshould be calledrunsWhereas it splits a sequence into runs (i.e., consecutive subsequences) where a predicate is true; -
splitWithsounds like it would do likeString'ssplitbut using an element predicate instead of an actual character (as in"ab$cd$efg".splitWith(notLetter) == Seq("ab","cd","efg")); maybe a better name would berunsUntilFlip? I'm not sure the function is that useful though; I'd prefer a function that returns two distinct sequences, and it would be calledpartitionRunsWhereorpartitionRunsWith.
We have precedent for an operation returning consecutive subsequences: grouped. I think we should use grouped instead of runs.
It's unfortunate, because the "group" terminology does not conceptually imply "consecutive-ity", which is a central property here. This is exacerbated by other methods like groupBy also talking about groups, but groups that are not consecutive!
In fact, it's also a useful operation to get consecutive groups sharing the same result by some function. How would you call it, then, given that groupBy is taken? groupedBy? Scalaz calls it splitBy... it could also be called splitRunsBy 🙃
They are also similar to partition :smile: How about:
| Scalaz | collections |
|---|---|
splitWith |
partitionAll |
splitBy |
partitionGroup |
selectSplit |
partitionFilter |
@joroKr21 - I think you mean span, not partition. partition does not respect order.
Anyway, split is more appropriate for a name because that's already what it means for String.
If we want semantics substantially different from those in String, we may wish to use another word, especially if it collides with the split name on String. Note that String split drops the characters it splits on.
Other possibilities include grouped (we already do it by number), fragment, dice, chop, shatter, etc..
I do think it's important that the action word go first and the modifier go afterwards, so no selectSplit but rather splitSelect.
Yes, I meant span... I keep confusing these two methods and for the life of me I can't remember which is which.
If split is chosen then it should do the same thing as String.split - It should drop the element we split on, but also the (inconsistent IMO) way it handles empty substrings:
scala> "...".split('.')
res10: Array[String] = Array()
scala> ".42...".split('.')
res11: Array[String] = Array("", 42)
scala> ".42...13.".split('.')
res12: Array[String] = Array("", 42, "", "", 13)
I would also be happy with groupedWith, groupedBy and groupedFilter, but then groupBy vs groupedBy will be confusing. I don't like select because it is often used in relational context where it has a completely different meaning.
:+1: for the action first