streamly icon indicating copy to clipboard operation
streamly copied to clipboard

`groupsBy` lambda parameter order

Open shlok opened this issue 5 years ago • 6 comments
trafficstars

I wonder whether the parameter order for the lambda in Streamly.Prelude.groupsBy is intentional. Example:

>>> import qualified Streamly.Prelude as S
>>> import qualified Streamly.Internal.Data.Fold as FL
>>> S.toList $ S.groupsBy (>) FL.toList $ S.fromList [1,3,7,0,2,5]
> [[1,3,7],[0,2,5]]

However, an analogous example using Data.List.groupBy gives

>>> import qualified Data.List as L
>>> L.groupBy (>) [1,3,7,0,2,5]
> [[1],[3],[7,0,2,5]]

To get the same result in the first example, I have to switch the lambda’s parameters (i.e., change > to <).

Judging by the documentation of groupsBy (“a `cmp` b” and “a `cmp` c”), this deviation from Data.List was not intentional.

Would you suggest updating groupsBy (or creating another version of it for backwards compatibility), or simply leaving the existing behavior alone and fixing the documentation?

shlok avatar Jun 26 '20 10:06 shlok

I guess this was not intentional. For backward compatibility we should fix the documentation and leave the implementation the way it is, otherwise it can lead to silent bugs in existing programs. We should also point out in bold that the behavior is opposite to that of the list groupBy.

To avoid confusion, the combinator should perhaps be named as groupWhile i.e. keep grouping while the condition is true. The current combinator is more like groupUntil i.e. keep grouping until the condition becomes true.

How about creating a groupWhile combinator and deprecating the current one?

harendra-kumar avatar Jun 26 '20 11:06 harendra-kumar

@harendra-kumar If we add groupWhile, do you think we should also rename groupsByRolling to groupWhileRolling (for naming consistency)? The end-result I have in mind would be:

  • groupsBy: Unchanged. Doc changes. Deprecated. (Unexpected lambda parameter order.)
  • groupsByRolling: Unchanged. Deprecated. (Already has the expected lambda parameter order.)
  • groupWhile: Added. A variant of groupsBy with expected lambda parameter order.
  • groupWhileRolling: Added. Same as groupsByRolling.

shlok avatar Sep 18 '20 12:09 shlok

A few points to consider:

  • we should name it groupsWhile instead of groupWhile. Notice the old name was groupsBy and we have a groups as well. This is because we are creating many groups and not a group.
  • We now have groupBy in Fold and Parser module as well. In that case we need to have it groupWhile because the fold or the parser terminates after creating one group.

Also, the relevance of the groupsBy or groupsWhile operation in the stream module is questionable now that we are going to have Fold/Parser groupWhile operation, and we can just do groupsWhile = foldMany groupWhile. So we can even just deprecate it and recommend using foldMany groupWhile.

cc @adithyaov

harendra-kumar avatar Sep 19 '20 06:09 harendra-kumar

For me, ...While resonates with combinators that have the signature, (a -> Bool) whereas ...By resonates with (a -> a -> Bool).

Although this isn't strictly followed in the library (wordsBy), it is followed in most places.

adithyaov avatar Dec 10 '20 17:12 adithyaov

All the combinators will be reimplemented in serial stream. This can be corrected in serial stream. See https://github.com/composewell/streamly/issues/156

adithyaov avatar Jun 14 '22 14:06 adithyaov

An idiomatic implementation of groupsBy would be to use Parser.groupBy with parseMany,

groupsBy = parseMany Parser.groupBy

The argument order in Parser.groupBy is similar to that of Data.List. Ideally we should be using the idiomatic implementation. The idiomatic implementation, however, has some regressions.

Benchmark                                            default(μs)
---------------------------------------------------- -----------

All.Data.Stream/o-1-space.grouping.groupsByEq            1686.42
All.Data.Parser/o-1-space.parseMany (groupBy (==))       2451.07

All.Data.Stream/o-1-space.grouping.groupsByGT            1093.54
All.Data.Parser/o-1-space.parseMany (groupBy (<))        1023.13

Data.Stream(Allocated)
Benchmark                                            default(MiB)
---------------------------------------------------- ------------

All.Data.Stream/o-1-space.grouping.groupsByEq               10.67
All.Data.Parser/o-1-space.parseMany (groupBy (==))          12.95

All.Data.Stream/o-1-space.grouping.groupsByGT                6.09
All.Data.Parser/o-1-space.parseMany (groupBy (<))            6.10

adithyaov avatar Sep 15 '22 10:09 adithyaov