core-libraries-committee icon indicating copy to clipboard operation
core-libraries-committee copied to clipboard

Add `initMaybe`, `lastMaybe` and (stricter) `unsnoc'` to `Data.List`

Open mpilgrem opened this issue 1 year ago • 14 comments

Motivation: In connection with:

  • #292; and
  • (misconceived) #307

and a desire to give users complete alternatives to popular partial functions

this makes two related proposals, namely:

(1) add complete functions initMaybe :: [a] -> Maybe [a] and lastMaybe :: [a] -> Maybe a to Data.List; and

(2) add a unsnoc' :: [a] -> Maybe ([a], a) to Data.List that is stricter and more time-performant than existing unsnoc (e.g. like the ghc package's GHC.Utils.Misc.snocView - see further below);

together with appropriate Haddock documentation to assist users.

The use of a prime to indicate a stricter version of unsnoc would follow the pattern set by foldl/foldl' and foldr/foldr'.

Looking at Hoogle and the existing use of these names:

  • initMaybe and lastMaybe are provided by some or all of three packages rio (in part, a Prelude replacement), ghc, ghc-lib-parser and Agda - filling a perceived gap in base; and
  • package sized (which has only two reverse dependencies on Hackage) provides Data.Sized.unsnoc' :: forall f (n :: Nat) a proxy. (KnownNat n, CFreeMonoid f, Dom f a) => proxy n -> Sized f (Succ n) a -> Unsnoc f (Succ n) a. Module Data.Sized already exports many functions with names that clash with those provided by the Prelude, so I assume it is intended to be used imported qualified.

GHC.Utils.Misc.snocView has:

-- | Split a list into its last element and the initial part of the list.
-- @snocView xs = Just (init xs, last xs)@ for non-empty lists.
-- @snocView xs = Nothing@ otherwise.
-- Unless both parts of the result are guaranteed to be used
-- prefer separate calls to @last@ + @init@.
-- If you are guaranteed to use both, this will
-- be more efficient.
snocView :: [a] -> Maybe ([a], a)
snocView = fmap go . nonEmpty
  where
    go :: NonEmpty a -> ([a], a)
    go (x :| xs) = case nonEmpty xs of
        Nothing -> ([], x)
        Just xs -> case go xs of !(xs', x') -> (x : xs', x')

mpilgrem avatar Feb 08 '25 16:02 mpilgrem

Apologies for the prolonged absence of feedback. The CLC has been switching members.

If these functions made it into Data.List, I think they would get a lot of use. I suspect people who want to avoid an error thrown by passing an empty list to init or last are either pattern-matching the list, defining functions equivalent to the proposed initMaybe and lastMaybe, or just using a different type entirely (e.g. NonEmpty). Except for the last one of these cases, having these functions defined would be helpful.

On breakage, I suspect this would only break code that imports Data.List without an explicit import list and define (or import) functions with the same names as these, most likely with the same semantics as well.

As for snoc', I can't help but think that lists are not well-fitted for this purpose, but I'm sure this functionality is needed sometimes (as proved by the GHC example).

Daniel-Diaz avatar Feb 23 '25 10:02 Daniel-Diaz

There's a few more packages that seem to have definitions of things with the same name.

It looks like initMaybe is additionally defined in the subhask package.

lastMaybe is also defined in (some of these use this name for a non List function):

Agda-2.6.4.3
cabal-clean-0.2.20230609
calamity-0.11.0.0
calamity-commands-0.4.1.0
ghc-9.10.1
ghc-lib-parser-9.10.1.20240511
git-annex-10.20240430
github-backup-1.20200721
git-repair-1.20230814
haskell-language-server-2.8.0.0
hls-refactor-plugin-2.6.0.0
ipython-kernel-0.11.0.0
propellor-5.17
range-0.3.0.2
rio-0.1.22.0
stdio-0.2.0.0
subhask-0.1.1.0
Z-Data-2.0.0.2

unsnoc' is defined in:

sdp-0.2.1.1
sized-1.1.0.1
uniquely-represented-sets-0.1.0.0

As long as these packages get patched, I think this is a good change!

For packages like rio that export these identifiers, I think they should be patched so that they re-export the identifier from Data.List (guarded by CPP). This should avoid conflicting identifier warnings in downstream packages.

TeofilC avatar Feb 23 '25 10:02 TeofilC

I looked at the packages identified by @TeofilC. My notes are in the table below. 'Likely imported qualified' indicates that the exporting module exports a lot of functions with names that clash with Prelude exports.

I think they are all manageable, and if this were approved by the CLC, I would be happy to work through the list.

Package version initMaybe lastMaybe unsnoc’
Agda-2.6.4.3 Same functionality Same functionality  
cabal-clean-0.2.20230609   Executable-only package  
calamity-0.11.0.0   Different functionality, but ‘internal’ utility  
calamity-commands-0.4.1.0   Different functionality, but ‘internal’ utility  
ghc-9.10.1   Same functionality  
ghc-lib-parser-9.10.1.20240511   Same functionality  
git-annex-10.20240430   Executable-only package  
github-backup-1.20200721   Executable-only package  
git-repair-1.20230814   Executable-only package  
haskell-language-server-2.8.0.0   Same functionality  
hls-refactor-plugin-2.6.0.0   Same functionality  
ipython-kernel-0.11.0.0   Not top-level function  
propellor-5.17   Same functionality  
range-0.3.0.2   Not top-level function  
rio-0.1.22.0 A Prelude replacement (same functionality) A Prelude replacement (same functionality)  
stdio-0.2.0.0   Different functionality but likely imported qualified.  
subhask-0.1.1.0 A Prelude replacement. A Prelude replacement.  
Z-Data-2.0.0.2   Different functionality but likely imported qualified  
sdp-0.2.1.1     Different functionality, but likely imported qualified
sized-1.1.0.1     Different functionality, but likely imported qualified
uniquely-represented-sets-0.1.0.0     Different functionality, but likely imported qualified

mpilgrem avatar Feb 23 '25 22:02 mpilgrem

I think initMaybe and lastMaybe would be good to have. A good implementation that works with fusion may require some thought.

I'm not so sure about unsnoc'. For one, it is confusing to have both unsnoc and unsnoc', just like foldl and foldl' confuses beginners. Except the relation of unsnoc to unsnoc' is not really the same as foldl to foldl', which makes things even more confusing. Documentation may be able to help to some extent.
Secondly, there are other functions which are similarly inefficient due to their laziness, such as splitAt, span, partition. Do they also deserve strict counterparts?

meooow25 avatar Mar 03 '25 18:03 meooow25

@meooow25, putting aside Haddock documentation for the moment, does this 'work' from that perspective?

(EDIT: I realised my original version would end up with circular module dependencies)

import           GHC.Internal.Data.List.NonEmpty ( NonEmpty (..) )
import qualified GHC.Internal.List as List

-- nonEmpty is billed as "efficiently turns a normal list into a NonEmpty stream"
nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty []       = Nothing
nonEmpty (a : as) = Just (a :| as)

maybeInit :: [a] -> Maybe [a]
maybeInit = fmap init' . nonEmpty
 where
  init' ~(a :| as) = List.init (a : as) -- Taken from Data.List.NonEmpty

-- Contrast with GHC's version:
--  
-- lastMaybe [] = Nothing
-- lastMaybe (x : xs) = Just $ NE.last (x :| xs)
--
maybeLast :: [a] -> Maybe a
maybeLast = fmap last' . nonEmpty
 where
  last' ~(a :| as) = List.last (a : as) -- Taken from Data.List.NonEmpty

-- GHC's version (above), renamed
unsnoc' :: [a] -> Maybe ([a], a)
unsnoc' = fmap go . nonEmpty
  where
    go :: NonEmpty a -> ([a], a)
    go (x :| xs) = case nonEmpty xs of
        Nothing -> ([], x)
        Just xs -> case go xs of !(xs', x') -> (x : xs', x')

mpilgrem avatar Mar 03 '25 20:03 mpilgrem

Secondly, there are other functions which are similarly inefficient due to their laziness, such as splitAt, span, partition. Do they also deserve strict counterparts?

See https://github.com/haskell/core-libraries-committee/issues/133

Bodigrim avatar Mar 03 '25 20:03 Bodigrim

I think initMaybe and lastMaybe would be good to have. A good implementation that works with fusion may require some thought.

lastMaybe supports two different implementations, a lazy one and a strict one. The strict one walks the whole list before producing Just. The obvious way to try to fuse it is

lastMaybe xs = foldl (\_ a -> Just a) Nothing xs
{-# INLINABLE lastMaybe #-}

This relies on construction specialization to erase the Just constructors. Unfortunately, even compiling with -O2 (required for constructor specialization), the Just constructors remain. See GHC issue 25808, which I just opened.

The only way I was able to get this to work properly was to use unboxed sums:

lastMaybe xs = toMaybe (foldr (\a r _ -> r (# | a #)) (\m -> m) xs (# (##) | #)
{-# INLINABLE lastMaybe #-}

toMaybe (# (##) | #) = Nothing
toMaybe (# | a #) = Just a

For whatever reason, GHC has no trouble erasing the unboxed sums with -O2, and even with -O1 they won't allocate.


The lazy version of lastMaybe produces Just as soon as it sees the first cons. I don't see any obvious way to fuse that. It can be written as a fold like this, but then GHC can't erase the Justs. I think the problem is unavoidable, but I'd love to be proven wrong.

lastMaybe xs = foldr (\a r -> Just (fromMaybe a r)) Nothing xs

I will try to look at initMaybe as well. I thought I'd start with lastMaybe because I was sure it would be easy; I wasn't expecting the compiler to make such lousy code from something so simple.

treeowl avatar Mar 04 '25 01:03 treeowl

@meooow25, I don't think initMaybe can be a good consumer, while maintaining the laziness that's surely expected of it. The problem is going to be similar to the lazy version of lastMaybe in that regard. Interestingly, it does seem to work to make it a good producer, at least in sufficiently simple situations.

initMaybe :: [a] -> Maybe [a]
initMaybe [] = Nothing
initMaybe (a : as) = Just $ initMaybe' a as
{-# INLINABLE initMaybe #-}

initMaybe' :: a -> [a] -> [a]
initMaybe' a0 as0 = build $ \c n ->
  let
    go _a [] = n
    go a (b : bs) = a `c` go b bs
  in go a0 as0
{-# INLINE initMaybe' #-}

When I test this with something simple, it seems to elide the intermediate list:

flub :: [Int] -> Int
flub xs = case initMaybe xs of
  Just ys -> sum ys
  Nothing -> 0

Caveat: I'm a bit tired, so someone should verify that I'm reading the Core right, but I'm pretty sure I am.

treeowl avatar Mar 04 '25 06:03 treeowl

@meooow25, putting aside Haddock documentation for the moment, does this 'work' from that perspective?

Your maybeInit looks about right, but you'll have to grovel over Core to verify that it will actually fuse properly. Your maybeLast is the lazy version, which may or may not be what people want. The main problem with the lazy version of maybeLast is that it would be easy to accidentally leak memory if the resulting element is stored without forcing it. For example, suppose you do something like this:

case maybeLast xs of
  Nothing -> -- whatever
  Just x -> writeIORef ref x

x is a thunk that holds the entire xs list, and will continue to do so unless and until it is eventually forced. I suspect the lazy version is the wrong default, but certainly it can be useful sometimes.

treeowl avatar Mar 04 '25 06:03 treeowl

@meooow25, putting aside Haddock documentation for the moment, does this 'work' from that perspective?

To be a good consumer, it must use foldr on the input list (or something which inlines into a foldr). And to be a good producer, the list must be created with build (or something which inlines to a build).

maybeInit doesn't use foldr or build, so it does not fuse either way. maybeLast uses foldl (which inlines into a foldr), but since the input list is also scrutinized by nonEmpty it does not fuse.


treeowl's strict lastMaybe looks like a good implementation. I also think that's a better default than the lazy one.
~~I would tweak the definition a bit.~~

-lastMaybe xs = toMaybe (foldr (\a r _ -> r (# | a #)) (\m -> m) xs (# (##) | #))
+lastMaybe xs =          foldr (\a r _ -> r (# | a #)) toMaybe   xs (# (##) | #)

~~This is what GHC seems to optimize it to anyway.~~ Edit: I see the benefit of not doing this. If the Maybe is matched on, it can be eliminated.

@meooow25, I don't think initMaybe can be a good consumer, while maintaining the laziness that's surely expected of it.

It can be done just like your lazy lastMaybe.

-- lastMaybe xs = foldr (\a r -> Just (fromMaybe a r)) Nothing xs
initMaybe xs = foldr (\a r -> Just (maybe [] (a:) r)) Nothing xs

It's probably not a good idea, for the same reason. We can't get rid of the Justs.

The good producer implementation seems better.


One thing to note that Data.List.init is neither a good producer nor a good consumer (source). So maybe initMaybe doesn't have to fuse either, though it would be nice if it did.

Data.List.last does fuse and I think lastMaybe should too.

meooow25 avatar Mar 06 '25 20:03 meooow25

One thing to note that Data.List.init is neither a good producer nor a good consumer (source). So maybe initMaybe doesn't have to fuse either, though it would be nice if it did.

It's pretty surprising that the plain init doesn't fuse. I wonder why that is. We can probably just write

init xs = build $ \c n -> foldr go stop xs Nothing
  where
    go a r Nothing = r (Just a)
    go a r (Just prev) = prev `c` r (Just a)
    stop Nothing = error "init: empty list"
    stop (Just _) = n

(Or use rewrite rules to rewrite to that and write back and so on.)


As it turns out, the unboxed sums aren't totally necessary to achieve good optimization for initMaybe—conversion to another Maybe-like type does the same thing, or it's possible to use a foldr argument that deals with the magical SPEC type. But unboxed sums should be more efficient when compiling with -O1, when constructor specialization doesn't apply, so it may make sense to use those for both initMaybe and init, if testing validates a performance benefit. If we do end up using unboxed sums, we should probably consider using some nice wrappers for clarity:

newtype Maybe# a = M# (# (##) | a #)

pattern Nothing# :: Maybe# a
pattern Nothing# = M# (# (##) | #)

pattern Just# :: a -> Maybe# a
pattern (Just# a) = M# (# | a #)

treeowl avatar Mar 06 '25 21:03 treeowl

@mpilgrem have you been able to experiment with the various implementation suggestions?

hasufell avatar Mar 24 '25 11:03 hasufell

@hasufell, I have not - but I was waiting for the discussion of alternatives to settle. I don't have experience myself in 'tweaking for space/time performance'; I'm usually in camp 'trust the compiler'.

mpilgrem avatar Mar 25 '25 22:03 mpilgrem

I wonder if it's possible to have a preliminary implementation in ghc-experimental for more exposure and experimentation. You don't need CLC approval to do that.

If you raise such a PR you may also get more feedback from GHC developers.

hasufell avatar Mar 28 '25 06:03 hasufell

I think ghc-experimental is more like for GHC bleeding edge features, not for bog standard Haskell98 functions.

The proposal mentions prior art in other packages such as rio, Agda and ghc-the-library. I don't think one would get measurably more feedback from implementing it yet elsewhere.

All three mentioned libraries define (up to syntactic differences and inlining) lastMaybe xs = if null xs then Nothing else Just (Data.List.last xs). I'd suggest to do the same as the least surprising choice (both semantically and operationally), and similar for initMaybe: check for emptiness, then delegate the rest of the work to existing functions. So https://github.com/haskell/core-libraries-committee/issues/316#issuecomment-2695441334 looks reasonable to me.

Not really sure if there is much to gain from fusion and / or unboxed sums. It's an interesting exercise, but eventually if you are so dependent on fast init / last, lists are a very wrong data structure.

My take is:

  • lastMaybe is good, because it's an asymptotic improvement in allocations over existing unsnoc: O(1) vs O(n). Admittedly, these are cheap allocations in nursery, but nevertheless.
  • initMaybe is less convincing to me, because it's only a constant factor improvement over unsnoc (both for time and allocations). But if we go for lastMaybe, adding initMaybe makes the API look consistent.
  • unsnoc' is unconvincing. Minor improvement in a constant factor, but lots of documentation to write and lots of educational work to explain how to choose between unsnoc and unsnoc'. People can barely tell foldl from foldl', so I feel that making another prime distinction in a relatively obscure corner will cause more harm than good.

Bodigrim avatar Mar 28 '25 23:03 Bodigrim

I was waiting for the discussion of alternatives to settle.

FWIW, I believe treeowl and I agree on the desirable strictness and fusion behavior of lastMaybe and initMaybe. I'll summarize:

lastMaybe :: [a] -> Maybe a
lastMaybe xs = toMaybe (foldr (\a r _ -> r (# | a #)) (\m -> m) xs (# (##) | #))
{-# INLINE lastMaybe #-} -- Inline for list fusion

toMaybe :: (# (##) | a #) -> Maybe a
toMaybe (# (##) | #) = Nothing
toMaybe (# | a #) = Just a
initMaybe :: [a] -> Maybe [a]
initMaybe [] = Nothing
initMaybe (a0 : as0) = Just $ build $ \c n ->
  let go _a [] = n
      go a (b : bs) = a `c` go b bs
  in go a0 as0
{-# INLINE initMaybe #-} -- Inline for list fusion

If you raise such a PR you may also get more feedback from GHC developers.

+1, it would be good to know if GHC developers have comments on the implementations.

meooow25 avatar Mar 29 '25 11:03 meooow25

All three mentioned libraries define (up to syntactic differences and inlining) lastMaybe xs = if null xs then Nothing else Just (Data.List.last xs). I'd suggest to do the same as the least surprising choice (both semantically and operationally), and similar for initMaybe: check for emptiness, then delegate the rest of the work to existing functions.

I think base should be held to higher standards than other libraries, especially when it comes to properties like laziness and efficiency which can be tricky to get right. We do not know how much thought went into the definitions in other libraries. Before being added to base, they need to examined and the definitions changed if there is benefit in doing so.

meooow25 avatar Mar 29 '25 11:03 meooow25

Are there any other opinions on the specific implementation of lastMaybe and initMaybe?

@angerman @parsonsmatt @velveteer @TeofilC @Daniel-Diaz @ChickenProp @noughtmare @mpscholten @doyougnu


I mostly trust container maintainers @treeowl and @meooow25 on such matters. I'm also fine with starting with a simple implementation as suggested by @Bodigrim and doing an iteration sometime later.

hasufell avatar Apr 11 '25 13:04 hasufell

it would be good to know if GHC developers have comments on the implementations.

@simonpj @sgraf812 do you possibly have any comments on https://github.com/haskell/core-libraries-committee/issues/316#issuecomment-2763300644?

Bodigrim avatar Apr 12 '25 00:04 Bodigrim

Looks good to me. It is unclear to me whether INLINE or INLINABLE is best. I have wondered whether initMaybe can be a better consumer, but don't see how. At any rate it seems to be a better producer than init already.

sgraf812 avatar Apr 12 '25 11:04 sgraf812

FWIW I do not insist on a simpler implementation. If, as it seems, people have settled on a more advanced implementation, I'm all for it.

Bodigrim avatar Apr 20 '25 10:04 Bodigrim

@mpilgrem do you need further guidance to finalise your implementation choice?

hasufell avatar Apr 20 '25 11:04 hasufell

@hasufell, I'll have a go at an implementation - but it may be a little while hence, as there are some Stack things that is occupying me at present.

mpilgrem avatar Apr 22 '25 23:04 mpilgrem

@mpilgrem gentle reminder

hasufell avatar May 13 '25 12:05 hasufell

Thanks! Genuinely not forgotten, but a question of limited bandwidth and having to prioritise various things.

mpilgrem avatar May 13 '25 13:05 mpilgrem

@mpilgrem I'll close this for now, feel free to re-open once you have more bandwidth.

hasufell avatar May 27 '25 16:05 hasufell