core-libraries-committee
core-libraries-committee copied to clipboard
Add Data.List.compareLength
Let's add to Data.List
a new function:
compareLength :: [a] -> Int -> Ordering
compareLength = foldr (\_ f m -> if 0 > m then GT else f (m - 1)) (compare 0)
such that
compareLength xs c = compare (length xs) c
This function gives the same answer as comparing against the result of length
, but can short circuit if the count of elements is greater than the number, and hence be more efficient. It is also productive on infinite inputs, for which compare (length xs) c
would loop infinitely:
> compareLength [] 0
EQ
> compareLength [] 1
LT
> compareLength ['a'] 1
EQ
> compareLength ['a', 'b'] 1
GT
> compareLength [0..] 100
GT
Seasoned Haskellers know that the same effect can be achieved with genericLength
and Peano numbers (https://chrisdone.com/posts/lazy-list-length/), but let's not pretend this to be ergonomic or efficient.
Prior art:
- Is the other argument order also seen in the wild? I would have expected the list to be last.
- An equally important function is the continuous extension of
compare `on` length
. I believe we should consider the names of both functions at the same time.
Is the other argument order also seen in the wild? I would have expected the list to be last.
No, I cannot find any examples for Int -> [a] -> Ordering
.
An equally important function is the continuous extension of
compare `on` length
. I believe we should consider the names of both functions at the same time.
Prior art in extra
suggests comparingLength :: [a] -> [a] -> Ordering
.
Another function whose name I think we should have in the backs of our minds when considering this proposal is the one checking whether the length of a list is at least a particular number, and the two-list variants of that one (i.e., the extensions of (>=) `on` length
and (>=) `on` length
, extra lazy in the first and second argument, respectively). This zoo is out there in the wild (e.g., the GHC source code), but I don't know if anyone's come up with a nice, consistent, naming scheme.
Edit: similarly, less than and greater than.
similar: https://hackage.haskell.org/package/utility-ht-0.0.17.1/docs/Data-List-Match.html#v:compareLength
A simple implementation would be:
compareLength :: [a] -> Int -> Ordering
compareLength xs n = compare (void xs) (replicate n ())
I'm not sure I want to bikeshed naming of other similar functions right now. There is consistent prior art to use compareLength
for [a] -> Int -> Ordering
, and I'm happy to leave further developments to others. I do not think that such developments should steer us away from a well-attested name.
With regards to lazy analogs of length xs {<,<=,>,>=} c
: while they could be marginally better than compareLength
in certain scenarios, I'll leave it to others to convince CLC members that they are worth inclusion into base
. IMO compareLength
solves 95% of issues with performance and laziness.
compareLength xs n = compare (void xs) (replicate n ())
is neat, but foldr
in the top post is much better in terms of performance. It compiles to a tight little loop without any extra allocations:
compareLength1 :: forall {a}. [a] -> Int# -> Ordering
compareLength1
= \ (@a_s13T) (ds_s13L :: [a_s13T]) (ww_s13O :: Int#) ->
case ds_s13L of {
[] ->
case <# 0# ww_s13O of {
__DEFAULT ->
case ww_s13O of {
__DEFAULT -> GT;
0# -> EQ
};
1# -> LT
};
: y_a13H ys_a13I ->
case ># 0# ww_s13O of {
__DEFAULT -> compareLength1 ys_a13I (-# ww_s13O 1#);
1# -> GT
}
}
This seems like a solid improvement and follows prior art. I'm in favor.
What should be the result of compareLength undefined (-1)
?
For Data.ByteString.Lazy.compareLength
it is GT
.
For Data.Text.compareLength
and Data.List.Extra.compareLength
it is undefined
.
In other words, should we be lazy in the list if the length is negative?
There seem to be three sensible functions with this name:
compareLength_L :: Int -> [a] -> Ordering
compareLength_R :: [a] -> Int -> Ordering
compareLength_C :: [a] -> [a] -> Ordering
and no clear winner. Of compareLength_L
and compareLength_R
I would prefer compareLength_L
because most often the Int
argument will be a constant, which allows for partial application like in map (compareLength 1)
.
I guess you want to save an import of a utility library. However the price to pay is to restrict your code to a recent version GHC. I would not add such a function to base
.
I'm finding it hard to understand where this would be used. Could someone point to examples in the wild?
I'm finding it hard to understand where this would be used. Could someone point to examples in the wild?
I don't remember the names of the functions, or the module in which they're defined, but GHC uses several such functions. Vague recollection: try git grep lengthAtLeast
.
The text
and bytestring
versions of this function come with an extensive set of rewrite rules, should these be added here too?
And what about generalising to Foldable
and/or Num
?
As @treeowl mentions, GHC uses:
lengthAtMost :: [a] -> Int -> Bool
lengthIsNot :: [a] -> Int -> Bool
lengthIs :: [a] -> Int -> Bool
lengthAtLeast :: [a] -> Int -> Bool
lengthExceeds :: [a] -> Int -> Bool
lengthLessThan :: [a] -> Int -> Bool
compareLength :: [a] -> [b] -> Ordering
equalLength :: [a] -> [b] -> Bool
These are used to avoid forcing lazy lists or to avoid traversing potentially long lists.
What should be the result of
compareLength undefined (-1)
? ... In other words, should we be lazy in the list if the length is negative?
@meooow25 A weak precedent is that take
is lazy in the list if the length is non-positive:
> take 0 undefined
[]
Ideally I'd love to have compareLength :: [a] -> Word -> Ordering
, which does not pose this kind of tricky questions, but it would be out of line with the rest of Data.List
API.
I guess you want to save an import of a utility library. However the price to pay is to restrict your code to a recent version GHC. I would not add such a function to
base
.
@thielema this seems to be a general argument against adding anything ever to base
, isn't it? Is this proposal somehow particularly aggravating?
I want base
to exhibit best practices and help users to avoid common peformance / divergence pitfalls. Users can continue using utility libraries such as extra
, which will conditionally re-export a definition from base
, so there is no price to pay in terms of restricting to new GHCs only. We followed this pattern already for several functions, e. g., (!?)
and unsnoc
.
@tomjaguarpaw this proposals stems from our discussion with @Noughtmare at https://github.com/kowainik/stan/issues/550.
@Bodigrim that makes sense. But the proposed implementation needs some changes if it is to be lazy.
Since GHC uses compareLength :: [a] -> [b] -> Ordering
(since things with same type only are comparable) and lengthSomething :: [a] -> Int -> Bool
,
it is good to name them:
lengthGuess :: [a] -> Int -> Ordering
-- or
guessLength :: [a] -> Int -> Ordering
compareLength :: [a] -> [b] -> Ordering
I still like compareLength
more because it is the concatenation of the two functions you would otherwise use: compare (length xs) n
. I'd use comparingLength
for the function that GHC calls compareLength
, because that's again the concatenation of the two functions you'd use: comparing length xs ys
@sjoerdvisscher excellent questions!
The
text
andbytestring
versions of this function come with an extensive set of rewrite rules, should these be added here too?
Rules rewriting compare (length xs) n
to compareLength xs n
could change semantics, making diverging programs terminate. While I'd expect such change to be a desirable one in the majority of cases, I'd rather left it to a (potential) subsequent proposal.
In theory users might use length xs
as a poor man's deepseq
and expect that after compare (length xs) n
the spine of xs
has been forced. Is it a reasonable expectation?.. Could it be crucial property for certain programs not to explode?.. Let me leave it to someone else to explore.
And what about generalising to
Foldable
[...]?
While the definition above indeed works for any Foldable
, it's hard to tell whether compareLength
is better than (compare .) . length
for a given Foldable
or no. If length
is O(1) then compare (length xs) n
is faster than compareLength xs n
which is O(n).
Note that introduction of a monomorphic Data.List.compareLength
does not preclude anyone from raising a proposal to add Data.Foldable.compareLength
or whatever they fancy to argue for.
I think a good addition to the proposal would be Data.List.NonEmpty.compareLength
, because we know that length
is inefficient for NonEmpty
too.
And what about generalising to [...]
Num
?
It would be quite attractive for two reasons:
- to use
Word
instead ofInt
so that negative numbers are not an option, - to replace
compare (genericLength xs) (10 :: Int8)
(which does not do what you think) withcompareLength xs (10 :: Int8)
(which does what you think).
Yet I imagine that in the majority of cases n
in compareLength xs n
is a literal constant, so the change would force everyone to write an explicit type as in compareLength xs (10 :: Int)
, which will be annoying soon. So I think I'd stick to monomorphic Int
, continuing the notorious tradition of Data.List
.
@Bodigrim do you have an MR prepared for this function? More than once since you brought it up have I wished we already had it in base
.
I've updated the proposal and linked the MR.
I suggest a couple of changes.
compareLength :: [a] -> Int -> Ordering
compareLength xs n
| n < 0 = GT
- | otherwise = foldr (\_ f m -> if 0 > m then GT else f (m - 1)) (compare 0) xs n
+ | otherwise = foldr (\_ f m -> if m > 0 then f (m - 1) else GT) (\m -> if m > 0 then LT else EQ) xs n
- In the fold function,
m
being0
is enough to tell us that the list is longer. - Given that
m
is never negative, the final check does not need to test forGT
because it's impossible.
@meooow25 could you possibly elaborate what's improved by the suggested change?
Sure,
- If we stop at
m == 0
, we have to look at one less element to conclude with a result. This is observable with an example likeλ> foldr (\_ f m -> if 0 > m then GT else f (m - 1)) (compare 0) (1:2:undefined) 1 *** Exception: Prelude.undefined ... λ> foldr (\_ f m -> if m > 0 then f (m - 1) else GT) (compare 0) (1:2:undefined) 1 GT
-
compare 0
is short for (an equivalent of)\m -> if 0 < m then LT else (if 0 == m then EQ else GT)
. With the above change, the last branch which requiresm < 0
is impossible. So the0 == m
test will always be true and can be removed.
@meooow25 thanks, I've update the MR. Does it look good now?
Looks good, thanks!
I am wondering if it would be useful to INLINE
compareLength
for list fusion, but I am not convinced. Usually you would want to do something with the list after checking the length, which would kill fusion. I leave this to someone with a stronger opinion.
My preference is to apply {-# INLINE #-}
only if well justified, so I'd rather leave it to another proposal / proposer to investigate.
I'll trigger a vote soon unless there are more comments / suggestions.
Dear CLC members, let's vote on the proposal to add Data.List{,.NonEmpty}.compareLength :: [a] -> Int -> Ordering
, which is a safer and faster alternative to compare (length xs) n
. Please refer to the top post for more details, examples and prior art in existing libraries.
@hasufell @velveteer @mixphix @tomjaguarpaw @angerman @parsonsmatt
+1 from me, unsurprisingly.
+1, anything that reduces length
on lists is a win
+1
@mixphix @hasufell @tomjaguarpaw @angerman just a gentle reminder to vote.