core-libraries-committee Still more NonEmpty variants of inits & tails

Previously: #67

We currently have:

List.inits, List.tails :: [a] -> [[a]]
NonEmpty.inits, NonEmpty.tails :: Foldable f => f a -> NonEmpty [a]
NonEmpty.inits1, NonEmpty.tails1 :: NonEmpty a -> NonEmpty (NonEmpty a)

These three result types ([[a]], NonEmpty [a], NonEmpty (NonEmpty a)) form an incomplete square; the missing type is [NonEmpty a], and there is a perfectly reasonable variant of these functions that would return that type (it resembles inits1/tails1 but doesn't require its input to be non-empty).

The names are up for debate since I'm unable to find an existing convention for functions matching these types. But I propose adding to Data.List.NonEmpty the following (with these names or better ones):

nonEmptyInits, nonEmptyTails :: Foldable f => f a -> [NonEmpty a]
nonEmptyInits = Prelude.map fromList . List.drop 1 . List.inits . Foldable.toList
nonEmptyTails = Prelude.map fromList . List.init . List.tails . Foldable.toList

These are subexpressions of the current implementations of inits1 and tails1, which should be refactored to reference them. If these implementations are later optimized to, e.g., make fewer unnecessary run-time checks for emptiness, exposing them as functions makes those improvements available to users who would otherwise be defining these variants themselves.

Feb 06 '24 02:02 rhendric

It seems these identities hold (could someone double check?)

List.inits == Foldable.toList . NonEmpty.inits
nonEmptyInits == Foldable.toList . NonEmpty.inits1

If that's so, then perhaps this new combinator should be called List.inits1?

Feb 06 '24 07:02 tomjaguarpaw

Putting the new functions in Data.List would introduce a circular dependency with Data.List.NonEmpty.

Feb 06 '24 21:02 rhendric

-1. Just use toList if you're comfortable using NonEmpty.

Feb 07 '24 23:02 mixphix

@mixphix, if my input is a possibly-empty list, toList . NonEmpty.inits1 doesn't suffice. Is that what you're recommending or have I misunderstood?

Feb 07 '24 23:02 rhendric

Excuse me?

ghci> :m +Data.Foldable Data.List Data.List.NonEmpty
-- 1
ghci> Data.List.tails [1, 2, 3, 4]
[[1,2,3,4],[2,3,4],[3,4],[4],[]]
-- 2
ghci> Data.List.NonEmpty.tails (1 :| [2, 3, 4])
[1,2,3,4] :| [[2,3,4],[3,4],[4],[]]
-- 3
ghci> Data.List.NonEmpty.tails1 (1 :| [2, 3, 4])
(1 :| [2,3,4]) :| [2 :| [3,4],3 :| [4],4 :| []]
-- 4
ghci> Data.List.NonEmpty.toList $ Data.List.NonEmpty.tails1 (1 :| [2, 3, 4])
[1 :| [2,3,4],2 :| [3,4],3 :| [4],4 :| []]
-- 5
ghci> Data.List.NonEmpty.toList <$> Data.List.NonEmpty.tails1 (1 :| [2, 3, 4])
[1,2,3,4] :| [[2,3,4],[3,4],[4]]

ghci> let nonEmptyTails = fmap Data.List.NonEmpty.fromList . Data.List.init . Data.List.tails . Data.Foldable.toList
ghci> nonEmptyTails (1 :| [2, 3, 4])
[1 :| [2,3,4],2 :| [3,4],3 :| [4],4 :| []] -- same as 4
ghci> nonEmptyTails [1, 2, 3, 4]
[1 :| [2,3,4],2 :| [3,4],3 :| [4],4 :| []] -- same as 4

Feb 08 '24 18:02 mixphix

But nonEmptyTails also works on [a]; toList . tails1 doesn't.

Feb 08 '24 19:02 tomjaguarpaw

mapMaybe nonEmpty . Data.List.tails

This is not primitive, not useful, and not hard to get by without.

Feb 08 '24 20:02 mixphix

Again, these functions appear as subexpressions of the existing inits1 and tails1 implementations, which seems like evidence of usefulness to me, as well as evidence that these are, if not ‘primitive’, at least more so than the approved inits1 and tails1.

I agree we can get by without it. It's not hard to implement, but neither is it hard to implement inits1 or tails1. There are, I think, interesting performance questions for these functions—namely, do they interfere with list fusion, and do they waste cycles checking at run time emptiness properties that can be known at compile time. I don't think answering these questions is trivial. So the actual thing I would hope to save users from doing by including nonEmptyInits and nonEmptyTails in base is not the cost of implementing them, but the cost of determining which of their possible implementations works best with the rest of base for optimization purposes. (I don't claim that my proposed implementation is at all optimal in this sense, but once the CLC approves some implementation I can iterate with GHC on getting the best implementation.)

Regardless, base already contains an implementation of both functions. They're just contained within the definitions of inits1 and tails1. I would think that the barrier to exposing an existing implementation should be lower than the barrier to taking on an additional function.

Sorry if this is too trivial a thing to suggest.

Feb 08 '24 21:02 rhendric

Prior art is Data.ByteString.initsNE and Data.ByteString.tailsNE and work-in-progress PR, adding the same functions to text: https://github.com/haskell/text/pull/558.

I personally think that this is a useful addition. It's fairly common to encounter things like tail . inits, which would be nice to rewrite without partial functions as NE.tail . initsNE.

Feb 11 '24 18:02 Bodigrim

Sadly for me, those functions are analogous to the existing NonEmpty.inits, not my proposed variants. The ByteString analog of nonEmptyInits would return [NonEmptyByteString], if such a type were to exist.

Feb 11 '24 19:02 rhendric

Ah, my bad, sorry.

I'm not really sure about it then. If we can come up with good, self-explaining names then improving consistency and symmetry would be nice. But otherwise - as one can guess from my mistake above - there is a cost of offering users too many confusingly similar alternatives.

Feb 11 '24 20:02 Bodigrim

@rhendric how would you like to proceed? If you would like us to vote as is, please raise an MR.

May 12 '24 10:05 Bodigrim

If that's so, then perhaps this new combinator should be called List.inits1?

@tomjaguarpaw, I owe you an apology. I didn't consider this proposal seriously enough because I wasn't aware of how list types and functions are distributed among the various internal modules, and now that I've taken a look I see it's possible for things in Data.List to depend on the NonEmpty type. I've adopted your approach; reusing the inits1 and tails1 names in the List namespace strikes me as much clearer than any alternative.

MR is here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12659

May 18 '24 07:05 rhendric

Dear CLC members, any non-binding opinions on this?

@tomjaguarpaw @mixphix @hasufell @angerman @parsonsmatt @velveteer

May 23 '24 21:05 Bodigrim

Are those expressions common idioms? I find it hard to follow the motivation. "Symmetry" is a bit vague.

Jun 01 '24 11:06 hasufell

The motivation is identical to that in #67, because these inits1 and tails1 are the same functions as those inits1 and tails1, just with the surrounding NonEmpty.fromList . / . Foldable.toList removed. There's no reason why the desire to get only non-empty inits/tails should be limited to inputs that are themselves non-empty.

I can quote the example from that earlier proposal verbatim, with one edit:

A concrete example where this would be useful is in the PureScript compiler: https://github.com/purescript/purescript/blob/8181c4fee34fdfa576ad7029ec2303f71020e1b6/src/Language/PureScript/TypeChecker/Entailment.hs#L261-L273 At the moment, we have code that uses regular lists, and does
for (init $ tails chain) $ \(tcd:tl) -> ...
where chain is a regular list which is known to be nonempty, and where we have an unsafe incomplete pattern match which doesn't blow up at runtime because of the init (which removes the empty list at the end of tails). With tails1, we could ~~change chain to be a NonEmpty, and~~ change the above line to
for (tails1 chain) $ \(tcd :| tl) -> ...
and have all of the non-emptiness facts that we are relying on tracked safely in the types.

The struck-out bit is the point of this proposal.

Jun 03 '24 01:06 rhendric

Weak +1 for symmetry (as in less surprising behaviour for the consumer). I guess I'm fundamentally not fully convinced that I want *1 shorthands over explicit composition of more basic building blocks. However that's mostly a very personal preference for me. Hence weak +1 if it helps others. It doesn't seem to be breaking existing code.

Jun 06 '24 04:06 angerman

@rhendric would you like to trigger a vote? Or is there more to discuss?

Jun 15 '24 16:06 Bodigrim

Yes, vote please. I've tried to amend the original description to reflect the points I raised along the way. I'm still not confident I have a good read on what's important to the CLC in terms of motivation, but there are both aesthetic (‘symmetry’) and practical arguments for these functions, and I may have given too much precedence to the former. Please review the top comment and vote on that and not on anything you thought I may have been selling over the last several months.

Jun 15 '24 16:06 rhendric

Dear CLC members, let's vote on the proposal to add inits1, tails1 :: [a] -> [NonEmpty a] to Data.List, which concludes work started in #67 to provide complete set of variants of inits and tails for empty and non-empty lists. Please read the executive summary in https://github.com/haskell/core-libraries-committee/issues/252#issue-2119832504 for more details.

@tomjaguarpaw @mixphix @angerman @hasufell @parsonsmatt @velveteer

Jun 16 '24 16:06 Bodigrim

I've been somewhat sceptical of the proposal, but today accidentally I was trying to build a certain variation of span and realised that I lack tails1 :: [a] -> NonEmpty a to avoid manual recursion.

The reason is that functions which return a tail of the list untouched (drop, splitAt, dropWhile, span, break, stripPrefix, deleteBy, insertBy to name a few) are usually expressible via paramorphism, which is a fold having access not only to the current element and accumulator but also to the rest of the list:

para :: forall a b. (a -> [a] -> b -> b) -> b -> [a] -> b
para f z = go
  where
    go :: [a] -> b
    go [] = z
    go (x : xs) = f x xs (go xs)

For example,

span :: (a -> Bool) -> [a] -> ([a], [a])
span p = para (\x xs -> if p x then first (x :) else const ([], x : xs)) ([], [])

We could have expressed span via foldr, but it would be inefficient, because it would have to traverse and rebuild the entire suffix of the list. But para allows us to shortcut and return the suffix untouched as soon as we reach it.

Now we don't have para anywhere near base, but what I realised is that para is just tails1 :: [a] -> NonEmpty a in disguise:

para :: forall a b. (a -> [a] -> b -> b) -> b -> [a] -> b
para f z = foldr (\(x :| xs) -> x `f` xs) z . tails1

(...list paramorphism is a composition of list catamorphism applied to NonEmpty catamorphism and tails1...)

So we can write

span :: (a -> Bool) -> [a] -> ([a], [a])
span p = foldr (\(x :| xs) -> if p x then first (x :) else const ([], x : xs)) ([], []) . tails1

Given that tails1 is defined via build, it all fuses perfectly well, eliminating any overhead for (:|). Let me demonstrate it on a simpler function:

dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p = foldr (\(x :| xs) -> if p x then id else const (x : xs)) [] . tails1

becomes

dropWhile :: forall a. (a -> Bool) -> [a] -> [a]
dropWhile
  = \ (@a_a1bp) (p_a10U :: a_a1bp -> Bool) (eta_B0 :: [a_a1bp]) ->
      joinrec {
        tails1Go_s1h6 :: [a_a1bp] -> [a_a1bp]
        tails1Go_s1h6 (ds_d1cJ :: [a_a1bp])
          = case ds_d1cJ of wild_X1 {
              [] -> [];
              : x_aBK xs_aBL ->
                case p_a10U x_aBK of {
                  False -> wild_X1;
                  True -> jump tails1Go_s1h6 xs_aBL
                }
            }; } in
      jump tails1Go_s1h6 eta_B0

So I'm strongly +1.

Jun 19 '24 23:06 Bodigrim

+1

Jun 20 '24 10:06 tomjaguarpaw

+1

Jun 20 '24 14:06 velveteer

Thanks for the exposition, @Bodigrim. I've come around on this proposal. +1

Jun 20 '24 15:06 mixphix

Still +1

Jun 20 '24 15:06 angerman

+1

Jun 20 '24 17:06 parsonsmatt

Thanks all, that's enough votes to approve.

Jun 20 '24 21:06 Bodigrim

core-libraries-committee core-libraries-committee copied to clipboard

Still more NonEmpty variants of inits & tails

core-libraries-committee
core-libraries-committee copied to clipboard