core-libraries-committee
core-libraries-committee copied to clipboard
Still more NonEmpty variants of inits & tails
Previously: #67
We currently have:
List.inits, List.tails :: [a] -> [[a]]
NonEmpty.inits, NonEmpty.tails :: Foldable f => f a -> NonEmpty [a]
NonEmpty.inits1, NonEmpty.tails1 :: NonEmpty a -> NonEmpty (NonEmpty a)
These three result types ([[a]]
, NonEmpty [a]
, NonEmpty (NonEmpty a)
) form an incomplete square; the missing type is [NonEmpty a]
, and there is a perfectly reasonable variant of these functions that would return that type (it resembles inits1
/tails1
but doesn't require its input to be non-empty).
The names are up for debate since I'm unable to find an existing convention for functions matching these types. But I propose adding to Data.List.NonEmpty
the following (with these names or better ones):
nonEmptyInits, nonEmptyTails :: Foldable f => f a -> [NonEmpty a]
nonEmptyInits = Prelude.map fromList . List.drop 1 . List.inits . Foldable.toList
nonEmptyTails = Prelude.map fromList . List.init . List.tails . Foldable.toList
These are subexpressions of the current implementations of inits1
and tails1
, which should be refactored to reference them. If these implementations are later optimized to, e.g., make fewer unnecessary run-time checks for emptiness, exposing them as functions makes those improvements available to users who would otherwise be defining these variants themselves.
It seems these identities hold (could someone double check?)
List.inits == Foldable.toList . NonEmpty.inits
nonEmptyInits == Foldable.toList . NonEmpty.inits1
If that's so, then perhaps this new combinator should be called List.inits1
?
Putting the new functions in Data.List
would introduce a circular dependency with Data.List.NonEmpty
.
-1. Just use toList
if you're comfortable using NonEmpty
.
@mixphix, if my input is a possibly-empty list, toList . NonEmpty.inits1
doesn't suffice. Is that what you're recommending or have I misunderstood?
Excuse me?
ghci> :m +Data.Foldable Data.List Data.List.NonEmpty
-- 1
ghci> Data.List.tails [1, 2, 3, 4]
[[1,2,3,4],[2,3,4],[3,4],[4],[]]
-- 2
ghci> Data.List.NonEmpty.tails (1 :| [2, 3, 4])
[1,2,3,4] :| [[2,3,4],[3,4],[4],[]]
-- 3
ghci> Data.List.NonEmpty.tails1 (1 :| [2, 3, 4])
(1 :| [2,3,4]) :| [2 :| [3,4],3 :| [4],4 :| []]
-- 4
ghci> Data.List.NonEmpty.toList $ Data.List.NonEmpty.tails1 (1 :| [2, 3, 4])
[1 :| [2,3,4],2 :| [3,4],3 :| [4],4 :| []]
-- 5
ghci> Data.List.NonEmpty.toList <$> Data.List.NonEmpty.tails1 (1 :| [2, 3, 4])
[1,2,3,4] :| [[2,3,4],[3,4],[4]]
ghci> let nonEmptyTails = fmap Data.List.NonEmpty.fromList . Data.List.init . Data.List.tails . Data.Foldable.toList
ghci> nonEmptyTails (1 :| [2, 3, 4])
[1 :| [2,3,4],2 :| [3,4],3 :| [4],4 :| []] -- same as 4
ghci> nonEmptyTails [1, 2, 3, 4]
[1 :| [2,3,4],2 :| [3,4],3 :| [4],4 :| []] -- same as 4
But nonEmptyTails
also works on [a]
; toList . tails1
doesn't.
mapMaybe nonEmpty . Data.List.tails
This is not primitive, not useful, and not hard to get by without.
Again, these functions appear as subexpressions of the existing inits1
and tails1
implementations, which seems like evidence of usefulness to me, as well as evidence that these are, if not ‘primitive’, at least more so than the approved inits1
and tails1
.
I agree we can get by without it. It's not hard to implement, but neither is it hard to implement inits1
or tails1
. There are, I think, interesting performance questions for these functions—namely, do they interfere with list fusion, and do they waste cycles checking at run time emptiness properties that can be known at compile time. I don't think answering these questions is trivial. So the actual thing I would hope to save users from doing by including nonEmptyInits
and nonEmptyTails
in base
is not the cost of implementing them, but the cost of determining which of their possible implementations works best with the rest of base
for optimization purposes. (I don't claim that my proposed implementation is at all optimal in this sense, but once the CLC approves some implementation I can iterate with GHC on getting the best implementation.)
Regardless, base
already contains an implementation of both functions. They're just contained within the definitions of inits1
and tails1
. I would think that the barrier to exposing an existing implementation should be lower than the barrier to taking on an additional function.
Sorry if this is too trivial a thing to suggest.
Prior art is Data.ByteString.initsNE
and Data.ByteString.tailsNE
and work-in-progress PR, adding the same functions to text
: https://github.com/haskell/text/pull/558.
I personally think that this is a useful addition. It's fairly common to encounter things like tail . inits
, which would be nice to rewrite without partial functions as NE.tail . initsNE
.
Sadly for me, those functions are analogous to the existing NonEmpty.inits
, not my proposed variants. The ByteString
analog of nonEmptyInits
would return [NonEmptyByteString]
, if such a type were to exist.
Ah, my bad, sorry.
I'm not really sure about it then. If we can come up with good, self-explaining names then improving consistency and symmetry would be nice. But otherwise - as one can guess from my mistake above - there is a cost of offering users too many confusingly similar alternatives.
@rhendric how would you like to proceed? If you would like us to vote as is, please raise an MR.
If that's so, then perhaps this new combinator should be called
List.inits1
?
@tomjaguarpaw, I owe you an apology. I didn't consider this proposal seriously enough because I wasn't aware of how list types and functions are distributed among the various internal modules, and now that I've taken a look I see it's possible for things in Data.List
to depend on the NonEmpty
type. I've adopted your approach; reusing the inits1
and tails1
names in the List
namespace strikes me as much clearer than any alternative.
MR is here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12659
Dear CLC members, any non-binding opinions on this?
@tomjaguarpaw @mixphix @hasufell @angerman @parsonsmatt @velveteer
Are those expressions common idioms? I find it hard to follow the motivation. "Symmetry" is a bit vague.
The motivation is identical to that in #67, because these inits1
and tails1
are the same functions as those inits1
and tails1
, just with the surrounding NonEmpty.fromList .
/ . Foldable.toList
removed. There's no reason why the desire to get only non-empty inits/tails should be limited to inputs that are themselves non-empty.
I can quote the example from that earlier proposal verbatim, with one edit:
A concrete example where this would be useful is in the PureScript compiler: https://github.com/purescript/purescript/blob/8181c4fee34fdfa576ad7029ec2303f71020e1b6/src/Language/PureScript/TypeChecker/Entailment.hs#L261-L273 At the moment, we have code that uses regular lists, and does
for (init $ tails chain) $ \(tcd:tl) -> ...
where
chain
is a regular list which is known to be nonempty, and where we have an unsafe incomplete pattern match which doesn't blow up at runtime because of theinit
(which removes the empty list at the end oftails
). Withtails1
, we couldchangechange the above line tochain
to be aNonEmpty
, andfor (tails1 chain) $ \(tcd :| tl) -> ...
and have all of the non-emptiness facts that we are relying on tracked safely in the types.
The struck-out bit is the point of this proposal.
Weak +1 for symmetry (as in less surprising behaviour for the consumer). I guess I'm fundamentally not fully convinced that I want *1 shorthands over explicit composition of more basic building blocks. However that's mostly a very personal preference for me. Hence weak +1 if it helps others. It doesn't seem to be breaking existing code.
@rhendric would you like to trigger a vote? Or is there more to discuss?
Yes, vote please. I've tried to amend the original description to reflect the points I raised along the way. I'm still not confident I have a good read on what's important to the CLC in terms of motivation, but there are both aesthetic (‘symmetry’) and practical arguments for these functions, and I may have given too much precedence to the former. Please review the top comment and vote on that and not on anything you thought I may have been selling over the last several months.
Dear CLC members, let's vote on the proposal to add inits1, tails1 :: [a] -> [NonEmpty a]
to Data.List
, which concludes work started in #67 to provide complete set of variants of inits
and tails
for empty and non-empty lists. Please read the executive summary in https://github.com/haskell/core-libraries-committee/issues/252#issue-2119832504 for more details.
@tomjaguarpaw @mixphix @angerman @hasufell @parsonsmatt @velveteer
I've been somewhat sceptical of the proposal, but today accidentally I was trying to build a certain variation of span
and realised that I lack tails1 :: [a] -> NonEmpty a
to avoid manual recursion.
The reason is that functions which return a tail of the list untouched (drop
, splitAt
, dropWhile
, span
, break
, stripPrefix
, deleteBy
, insertBy
to name a few) are usually expressible via paramorphism, which is a fold having access not only to the current element and accumulator but also to the rest of the list:
para :: forall a b. (a -> [a] -> b -> b) -> b -> [a] -> b
para f z = go
where
go :: [a] -> b
go [] = z
go (x : xs) = f x xs (go xs)
For example,
span :: (a -> Bool) -> [a] -> ([a], [a])
span p = para (\x xs -> if p x then first (x :) else const ([], x : xs)) ([], [])
We could have expressed span
via foldr
, but it would be inefficient, because it would have to traverse and rebuild the entire suffix of the list. But para
allows us to shortcut and return the suffix untouched as soon as we reach it.
Now we don't have para
anywhere near base
, but what I realised is that para
is just tails1 :: [a] -> NonEmpty a
in disguise:
para :: forall a b. (a -> [a] -> b -> b) -> b -> [a] -> b
para f z = foldr (\(x :| xs) -> x `f` xs) z . tails1
(...list paramorphism is a composition of list catamorphism applied to NonEmpty
catamorphism and tails1
...)
So we can write
span :: (a -> Bool) -> [a] -> ([a], [a])
span p = foldr (\(x :| xs) -> if p x then first (x :) else const ([], x : xs)) ([], []) . tails1
Given that tails1
is defined via build
, it all fuses perfectly well, eliminating any overhead for (:|)
. Let me demonstrate it on a simpler function:
dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p = foldr (\(x :| xs) -> if p x then id else const (x : xs)) [] . tails1
becomes
dropWhile :: forall a. (a -> Bool) -> [a] -> [a]
dropWhile
= \ (@a_a1bp) (p_a10U :: a_a1bp -> Bool) (eta_B0 :: [a_a1bp]) ->
joinrec {
tails1Go_s1h6 :: [a_a1bp] -> [a_a1bp]
tails1Go_s1h6 (ds_d1cJ :: [a_a1bp])
= case ds_d1cJ of wild_X1 {
[] -> [];
: x_aBK xs_aBL ->
case p_a10U x_aBK of {
False -> wild_X1;
True -> jump tails1Go_s1h6 xs_aBL
}
}; } in
jump tails1Go_s1h6 eta_B0
So I'm strongly +1.
+1
+1
Thanks for the exposition, @Bodigrim. I've come around on this proposal. +1
Still +1
+1
Thanks all, that's enough votes to approve.