scala-library-next icon indicating copy to clipboard operation
scala-library-next copied to clipboard

Consider adding `split` operations

Open julienrf opened this issue 8 years ago • 11 comments

Scalaz’s Foldable has two variants of partitioning methods, named splitWith and selectSplit:

https://github.com/scalaz/scalaz/blob/series/7.3.x/core/src/main/scala/scalaz/Foldable.scala#L303-L351

julienrf avatar Sep 14 '17 13:09 julienrf

Hey @markus1189, to finish this work you should rely on IsSeqLike instead (though it currently doesn’t support views). Are you interested in continuing this effort?

julienrf avatar Jun 07 '18 07:06 julienrf

I am interested, will give it a try soon

markus1189 avatar Jun 07 '18 23:06 markus1189

FYI I’ve used splitWith as a test in this PR: https://github.com/scala/scala/pull/6674/files#diff-1ac5384a2a49471ae923396b6ee3f269R47

julienrf avatar Jun 08 '18 07:06 julienrf

@julienrf Is it still supposed to be in collections-contrib and inside a SeqDecorator as it was originally? Seems like a lot of the stuff there was deleted. Where did you use splitWith in that pr? It's not there, or I am blind :wink:

Also, where can I find the IsSeqLike?

markus1189 avatar Jun 11 '18 16:06 markus1189

Oops, I’ve updated the PR yesterday and renamed it to groupedWith: https://github.com/scala/scala/pull/6674/files#diff-1ac5384a2a49471ae923396b6ee3f269R80

Here is the story about IsSeqLike. It’s part of the standard library but in collections-contrib we have something similar named HasSeqOps. The goal of the PR I linked is to unify them.

The idea I have in mind would be for you work to target the collections-contrib module. But you can also do a PR on scala/scala (without using IsSeqLike since you will be able to implement it directly in the collections, then) and start a discussion to have them in the standard library…

julienrf avatar Jun 12 '18 06:06 julienrf

@julienrf Here's another naming quibble!

I think the names splitWith and selectSplit are terrible!

  • selectSplit should be called runsWhere as it splits a sequence into runs (i.e., consecutive subsequences) where a predicate is true;

  • splitWith sounds like it would do like String's split but using an element predicate instead of an actual character (as in "ab$cd$efg".splitWith(notLetter) == Seq("ab","cd","efg")); maybe a better name would be runsUntilFlip? I'm not sure the function is that useful though; I'd prefer a function that returns two distinct sequences, and it would be called partitionRunsWhere or partitionRunsWith.

LPTK avatar Jun 15 '18 09:06 LPTK

We have precedent for an operation returning consecutive subsequences: grouped. I think we should use grouped instead of runs.

julienrf avatar Jun 15 '18 09:06 julienrf

It's unfortunate, because the "group" terminology does not conceptually imply "consecutive-ity", which is a central property here. This is exacerbated by other methods like groupBy also talking about groups, but groups that are not consecutive!

In fact, it's also a useful operation to get consecutive groups sharing the same result by some function. How would you call it, then, given that groupBy is taken? groupedBy? Scalaz calls it splitBy... it could also be called splitRunsBy 🙃

LPTK avatar Jun 15 '18 10:06 LPTK

They are also similar to partition :smile: How about:

Scalaz collections
splitWith partitionAll
splitBy partitionGroup
selectSplit partitionFilter

joroKr21 avatar Jun 15 '18 13:06 joroKr21

@joroKr21 - I think you mean span, not partition. partition does not respect order.

Anyway, split is more appropriate for a name because that's already what it means for String.

If we want semantics substantially different from those in String, we may wish to use another word, especially if it collides with the split name on String. Note that String split drops the characters it splits on.

Other possibilities include grouped (we already do it by number), fragment, dice, chop, shatter, etc..

I do think it's important that the action word go first and the modifier go afterwards, so no selectSplit but rather splitSelect.

Ichoran avatar Jun 15 '18 16:06 Ichoran

Yes, I meant span... I keep confusing these two methods and for the life of me I can't remember which is which.

If split is chosen then it should do the same thing as String.split - It should drop the element we split on, but also the (inconsistent IMO) way it handles empty substrings:

scala> "...".split('.')
res10: Array[String] = Array()

scala> ".42...".split('.')
res11: Array[String] = Array("", 42)

scala> ".42...13.".split('.')
res12: Array[String] = Array("", 42, "", "", 13)

I would also be happy with groupedWith, groupedBy and groupedFilter, but then groupBy vs groupedBy will be confusing. I don't like select because it is often used in relational context where it has a completely different meaning.

:+1: for the action first

joroKr21 avatar Jun 16 '18 08:06 joroKr21