streamly
streamly copied to clipboard
`groupsBy` lambda parameter order
I wonder whether the parameter order for the lambda in Streamly.Prelude.groupsBy is intentional. Example:
>>> import qualified Streamly.Prelude as S
>>> import qualified Streamly.Internal.Data.Fold as FL
>>> S.toList $ S.groupsBy (>) FL.toList $ S.fromList [1,3,7,0,2,5]
> [[1,3,7],[0,2,5]]
However, an analogous example using Data.List.groupBy gives
>>> import qualified Data.List as L
>>> L.groupBy (>) [1,3,7,0,2,5]
> [[1],[3],[7,0,2,5]]
To get the same result in the first example, I have to switch the lambda’s parameters (i.e., change > to <).
Judging by the documentation of groupsBy (“a `cmp` b” and “a `cmp` c”), this deviation from Data.List was not intentional.
Would you suggest updating groupsBy (or creating another version of it for backwards compatibility), or simply leaving the existing behavior alone and fixing the documentation?
I guess this was not intentional. For backward compatibility we should fix the documentation and leave the implementation the way it is, otherwise it can lead to silent bugs in existing programs. We should also point out in bold that the behavior is opposite to that of the list groupBy.
To avoid confusion, the combinator should perhaps be named as groupWhile i.e. keep grouping while the condition is true. The current combinator is more like groupUntil i.e. keep grouping until the condition becomes true.
How about creating a groupWhile combinator and deprecating the current one?
@harendra-kumar If we add groupWhile, do you think we should also rename groupsByRolling to groupWhileRolling (for naming consistency)? The end-result I have in mind would be:
groupsBy: Unchanged. Doc changes. Deprecated. (Unexpected lambda parameter order.)groupsByRolling: Unchanged. Deprecated. (Already has the expected lambda parameter order.)groupWhile: Added. A variant ofgroupsBywith expected lambda parameter order.groupWhileRolling: Added. Same asgroupsByRolling.
A few points to consider:
- we should name it
groupsWhileinstead ofgroupWhile. Notice the old name wasgroupsByand we have agroupsas well. This is because we are creating many groups and not a group. - We now have
groupByin Fold and Parser module as well. In that case we need to have itgroupWhilebecause the fold or the parser terminates after creating one group.
Also, the relevance of the groupsBy or groupsWhile operation in the stream module is questionable now that we are going to have Fold/Parser groupWhile operation, and we can just do groupsWhile = foldMany groupWhile. So we can even just deprecate it and recommend using foldMany groupWhile.
cc @adithyaov
For me, ...While resonates with combinators that have the signature, (a -> Bool) whereas ...By resonates with (a -> a -> Bool).
Although this isn't strictly followed in the library (wordsBy), it is followed in most places.
All the combinators will be reimplemented in serial stream. This can be corrected in serial stream. See https://github.com/composewell/streamly/issues/156
An idiomatic implementation of groupsBy would be to use Parser.groupBy with parseMany,
groupsBy = parseMany Parser.groupBy
The argument order in Parser.groupBy is similar to that of Data.List.
Ideally we should be using the idiomatic implementation.
The idiomatic implementation, however, has some regressions.
Benchmark default(μs)
---------------------------------------------------- -----------
All.Data.Stream/o-1-space.grouping.groupsByEq 1686.42
All.Data.Parser/o-1-space.parseMany (groupBy (==)) 2451.07
All.Data.Stream/o-1-space.grouping.groupsByGT 1093.54
All.Data.Parser/o-1-space.parseMany (groupBy (<)) 1023.13
Data.Stream(Allocated)
Benchmark default(MiB)
---------------------------------------------------- ------------
All.Data.Stream/o-1-space.grouping.groupsByEq 10.67
All.Data.Parser/o-1-space.parseMany (groupBy (==)) 12.95
All.Data.Stream/o-1-space.grouping.groupsByGT 6.09
All.Data.Parser/o-1-space.parseMany (groupBy (<)) 6.10