Add `initMaybe`, `lastMaybe` and (stricter) `unsnoc'` to `Data.List`
Motivation: In connection with:
- #292; and
- (misconceived) #307
and a desire to give users complete alternatives to popular partial functions
this makes two related proposals, namely:
(1) add complete functions initMaybe :: [a] -> Maybe [a] and lastMaybe :: [a] -> Maybe a to Data.List; and
(2) add a unsnoc' :: [a] -> Maybe ([a], a) to Data.List that is stricter and more time-performant than existing unsnoc (e.g. like the ghc package's GHC.Utils.Misc.snocView - see further below);
together with appropriate Haddock documentation to assist users.
The use of a prime to indicate a stricter version of unsnoc would follow the pattern set by foldl/foldl' and foldr/foldr'.
Looking at Hoogle and the existing use of these names:
-
initMaybeandlastMaybeare provided by some or all of three packagesrio(in part, aPreludereplacement),ghc,ghc-lib-parserandAgda- filling a perceived gap inbase; and - package
sized(which has only two reverse dependencies on Hackage) providesData.Sized.unsnoc' :: forall f (n :: Nat) a proxy. (KnownNat n, CFreeMonoid f, Dom f a) => proxy n -> Sized f (Succ n) a -> Unsnoc f (Succ n) a. ModuleData.Sizedalready exports many functions with names that clash with those provided by thePrelude, so I assume it is intended to be used imported qualified.
GHC.Utils.Misc.snocView has:
-- | Split a list into its last element and the initial part of the list.
-- @snocView xs = Just (init xs, last xs)@ for non-empty lists.
-- @snocView xs = Nothing@ otherwise.
-- Unless both parts of the result are guaranteed to be used
-- prefer separate calls to @last@ + @init@.
-- If you are guaranteed to use both, this will
-- be more efficient.
snocView :: [a] -> Maybe ([a], a)
snocView = fmap go . nonEmpty
where
go :: NonEmpty a -> ([a], a)
go (x :| xs) = case nonEmpty xs of
Nothing -> ([], x)
Just xs -> case go xs of !(xs', x') -> (x : xs', x')
Apologies for the prolonged absence of feedback. The CLC has been switching members.
If these functions made it into Data.List, I think they would get a lot of use. I suspect people who want to avoid an error thrown by passing an empty list to init or last are either pattern-matching the list, defining functions equivalent to the proposed initMaybe and lastMaybe, or just using a different type entirely (e.g. NonEmpty). Except for the last one of these cases, having these functions defined would be helpful.
On breakage, I suspect this would only break code that imports Data.List without an explicit import list and define (or import) functions with the same names as these, most likely with the same semantics as well.
As for snoc', I can't help but think that lists are not well-fitted for this purpose, but I'm sure this functionality is needed sometimes (as proved by the GHC example).
There's a few more packages that seem to have definitions of things with the same name.
It looks like initMaybe is additionally defined in the subhask package.
lastMaybe is also defined in (some of these use this name for a non List function):
Agda-2.6.4.3
cabal-clean-0.2.20230609
calamity-0.11.0.0
calamity-commands-0.4.1.0
ghc-9.10.1
ghc-lib-parser-9.10.1.20240511
git-annex-10.20240430
github-backup-1.20200721
git-repair-1.20230814
haskell-language-server-2.8.0.0
hls-refactor-plugin-2.6.0.0
ipython-kernel-0.11.0.0
propellor-5.17
range-0.3.0.2
rio-0.1.22.0
stdio-0.2.0.0
subhask-0.1.1.0
Z-Data-2.0.0.2
unsnoc' is defined in:
sdp-0.2.1.1
sized-1.1.0.1
uniquely-represented-sets-0.1.0.0
As long as these packages get patched, I think this is a good change!
For packages like rio that export these identifiers, I think they should be patched so that they re-export the identifier from Data.List (guarded by CPP). This should avoid conflicting identifier warnings in downstream packages.
I looked at the packages identified by @TeofilC. My notes are in the table below. 'Likely imported qualified' indicates that the exporting module exports a lot of functions with names that clash with Prelude exports.
I think they are all manageable, and if this were approved by the CLC, I would be happy to work through the list.
| Package version | initMaybe |
lastMaybe |
unsnoc’ |
|---|---|---|---|
| Agda-2.6.4.3 | Same functionality | Same functionality | |
| cabal-clean-0.2.20230609 | Executable-only package | ||
| calamity-0.11.0.0 | Different functionality, but ‘internal’ utility | ||
| calamity-commands-0.4.1.0 | Different functionality, but ‘internal’ utility | ||
| ghc-9.10.1 | Same functionality | ||
| ghc-lib-parser-9.10.1.20240511 | Same functionality | ||
| git-annex-10.20240430 | Executable-only package | ||
| github-backup-1.20200721 | Executable-only package | ||
| git-repair-1.20230814 | Executable-only package | ||
| haskell-language-server-2.8.0.0 | Same functionality | ||
| hls-refactor-plugin-2.6.0.0 | Same functionality | ||
| ipython-kernel-0.11.0.0 | Not top-level function | ||
| propellor-5.17 | Same functionality | ||
| range-0.3.0.2 | Not top-level function | ||
| rio-0.1.22.0 | A Prelude replacement (same functionality) | A Prelude replacement (same functionality) | |
| stdio-0.2.0.0 | Different functionality but likely imported qualified. | ||
| subhask-0.1.1.0 | A Prelude replacement. | A Prelude replacement. | |
| Z-Data-2.0.0.2 | Different functionality but likely imported qualified | ||
| sdp-0.2.1.1 | Different functionality, but likely imported qualified | ||
| sized-1.1.0.1 | Different functionality, but likely imported qualified | ||
| uniquely-represented-sets-0.1.0.0 | Different functionality, but likely imported qualified |
I think initMaybe and lastMaybe would be good to have. A good implementation that works with fusion may require some thought.
I'm not so sure about unsnoc'. For one, it is confusing to have both unsnoc and unsnoc', just like foldl and foldl' confuses beginners. Except the relation of unsnoc to unsnoc' is not really the same as foldl to foldl', which makes things even more confusing. Documentation may be able to help to some extent.
Secondly, there are other functions which are similarly inefficient due to their laziness, such as splitAt, span, partition. Do they also deserve strict counterparts?
@meooow25, putting aside Haddock documentation for the moment, does this 'work' from that perspective?
(EDIT: I realised my original version would end up with circular module dependencies)
import GHC.Internal.Data.List.NonEmpty ( NonEmpty (..) )
import qualified GHC.Internal.List as List
-- nonEmpty is billed as "efficiently turns a normal list into a NonEmpty stream"
nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty (a : as) = Just (a :| as)
maybeInit :: [a] -> Maybe [a]
maybeInit = fmap init' . nonEmpty
where
init' ~(a :| as) = List.init (a : as) -- Taken from Data.List.NonEmpty
-- Contrast with GHC's version:
--
-- lastMaybe [] = Nothing
-- lastMaybe (x : xs) = Just $ NE.last (x :| xs)
--
maybeLast :: [a] -> Maybe a
maybeLast = fmap last' . nonEmpty
where
last' ~(a :| as) = List.last (a : as) -- Taken from Data.List.NonEmpty
-- GHC's version (above), renamed
unsnoc' :: [a] -> Maybe ([a], a)
unsnoc' = fmap go . nonEmpty
where
go :: NonEmpty a -> ([a], a)
go (x :| xs) = case nonEmpty xs of
Nothing -> ([], x)
Just xs -> case go xs of !(xs', x') -> (x : xs', x')
Secondly, there are other functions which are similarly inefficient due to their laziness, such as
splitAt,span,partition. Do they also deserve strict counterparts?
See https://github.com/haskell/core-libraries-committee/issues/133
I think
initMaybeandlastMaybewould be good to have. A good implementation that works with fusion may require some thought.
lastMaybe supports two different implementations, a lazy one and a strict one. The strict one walks the whole list before producing Just. The obvious way to try to fuse it is
lastMaybe xs = foldl (\_ a -> Just a) Nothing xs
{-# INLINABLE lastMaybe #-}
This relies on construction specialization to erase the Just constructors. Unfortunately, even compiling with -O2 (required for constructor specialization), the Just constructors remain. See GHC issue 25808, which I just opened.
The only way I was able to get this to work properly was to use unboxed sums:
lastMaybe xs = toMaybe (foldr (\a r _ -> r (# | a #)) (\m -> m) xs (# (##) | #)
{-# INLINABLE lastMaybe #-}
toMaybe (# (##) | #) = Nothing
toMaybe (# | a #) = Just a
For whatever reason, GHC has no trouble erasing the unboxed sums with -O2, and even with -O1 they won't allocate.
The lazy version of lastMaybe produces Just as soon as it sees the first cons. I don't see any obvious way to fuse that. It can be written as a fold like this, but then GHC can't erase the Justs. I think the problem is unavoidable, but I'd love to be proven wrong.
lastMaybe xs = foldr (\a r -> Just (fromMaybe a r)) Nothing xs
I will try to look at initMaybe as well. I thought I'd start with lastMaybe because I was sure it would be easy; I wasn't expecting the compiler to make such lousy code from something so simple.
@meooow25, I don't think initMaybe can be a good consumer, while maintaining the laziness that's surely expected of it. The problem is going to be similar to the lazy version of lastMaybe in that regard. Interestingly, it does seem to work to make it a good producer, at least in sufficiently simple situations.
initMaybe :: [a] -> Maybe [a]
initMaybe [] = Nothing
initMaybe (a : as) = Just $ initMaybe' a as
{-# INLINABLE initMaybe #-}
initMaybe' :: a -> [a] -> [a]
initMaybe' a0 as0 = build $ \c n ->
let
go _a [] = n
go a (b : bs) = a `c` go b bs
in go a0 as0
{-# INLINE initMaybe' #-}
When I test this with something simple, it seems to elide the intermediate list:
flub :: [Int] -> Int
flub xs = case initMaybe xs of
Just ys -> sum ys
Nothing -> 0
Caveat: I'm a bit tired, so someone should verify that I'm reading the Core right, but I'm pretty sure I am.
@meooow25, putting aside Haddock documentation for the moment, does this 'work' from that perspective?
Your maybeInit looks about right, but you'll have to grovel over Core to verify that it will actually fuse properly. Your maybeLast is the lazy version, which may or may not be what people want. The main problem with the lazy version of maybeLast is that it would be easy to accidentally leak memory if the resulting element is stored without forcing it. For example, suppose you do something like this:
case maybeLast xs of
Nothing -> -- whatever
Just x -> writeIORef ref x
x is a thunk that holds the entire xs list, and will continue to do so unless and until it is eventually forced. I suspect the lazy version is the wrong default, but certainly it can be useful sometimes.
@meooow25, putting aside Haddock documentation for the moment, does this 'work' from that perspective?
To be a good consumer, it must use foldr on the input list (or something which inlines into a foldr). And to be a good producer, the list must be created with build (or something which inlines to a build).
maybeInit doesn't use foldr or build, so it does not fuse either way.
maybeLast uses foldl (which inlines into a foldr), but since the input list is also scrutinized by nonEmpty it does not fuse.
treeowl's strict lastMaybe looks like a good implementation. I also think that's a better default than the lazy one.
~~I would tweak the definition a bit.~~
-lastMaybe xs = toMaybe (foldr (\a r _ -> r (# | a #)) (\m -> m) xs (# (##) | #))
+lastMaybe xs = foldr (\a r _ -> r (# | a #)) toMaybe xs (# (##) | #)
~~This is what GHC seems to optimize it to anyway.~~
Edit: I see the benefit of not doing this. If the Maybe is matched on, it can be eliminated.
@meooow25, I don't think
initMaybecan be a good consumer, while maintaining the laziness that's surely expected of it.
It can be done just like your lazy lastMaybe.
-- lastMaybe xs = foldr (\a r -> Just (fromMaybe a r)) Nothing xs
initMaybe xs = foldr (\a r -> Just (maybe [] (a:) r)) Nothing xs
It's probably not a good idea, for the same reason. We can't get rid of the Justs.
The good producer implementation seems better.
One thing to note that Data.List.init is neither a good producer nor a good consumer (source). So maybe initMaybe doesn't have to fuse either, though it would be nice if it did.
Data.List.last does fuse and I think lastMaybe should too.
One thing to note that
Data.List.initis neither a good producer nor a good consumer (source). So maybeinitMaybedoesn't have to fuse either, though it would be nice if it did.
It's pretty surprising that the plain init doesn't fuse. I wonder why that is. We can probably just write
init xs = build $ \c n -> foldr go stop xs Nothing
where
go a r Nothing = r (Just a)
go a r (Just prev) = prev `c` r (Just a)
stop Nothing = error "init: empty list"
stop (Just _) = n
(Or use rewrite rules to rewrite to that and write back and so on.)
As it turns out, the unboxed sums aren't totally necessary to achieve good optimization for initMaybe—conversion to another Maybe-like type does the same thing, or it's possible to use a foldr argument that deals with the magical SPEC type. But unboxed sums should be more efficient when compiling with -O1, when constructor specialization doesn't apply, so it may make sense to use those for both initMaybe and init, if testing validates a performance benefit. If we do end up using unboxed sums, we should probably consider using some nice wrappers for clarity:
newtype Maybe# a = M# (# (##) | a #)
pattern Nothing# :: Maybe# a
pattern Nothing# = M# (# (##) | #)
pattern Just# :: a -> Maybe# a
pattern (Just# a) = M# (# | a #)
@mpilgrem have you been able to experiment with the various implementation suggestions?
@hasufell, I have not - but I was waiting for the discussion of alternatives to settle. I don't have experience myself in 'tweaking for space/time performance'; I'm usually in camp 'trust the compiler'.
I wonder if it's possible to have a preliminary implementation in ghc-experimental for more exposure and experimentation. You don't need CLC approval to do that.
If you raise such a PR you may also get more feedback from GHC developers.
I think ghc-experimental is more like for GHC bleeding edge features, not for bog standard Haskell98 functions.
The proposal mentions prior art in other packages such as rio, Agda and ghc-the-library. I don't think one would get measurably more feedback from implementing it yet elsewhere.
All three mentioned libraries define (up to syntactic differences and inlining) lastMaybe xs = if null xs then Nothing else Just (Data.List.last xs). I'd suggest to do the same as the least surprising choice (both semantically and operationally), and similar for initMaybe: check for emptiness, then delegate the rest of the work to existing functions. So https://github.com/haskell/core-libraries-committee/issues/316#issuecomment-2695441334 looks reasonable to me.
Not really sure if there is much to gain from fusion and / or unboxed sums. It's an interesting exercise, but eventually if you are so dependent on fast init / last, lists are a very wrong data structure.
My take is:
-
lastMaybeis good, because it's an asymptotic improvement in allocations over existingunsnoc: O(1) vs O(n). Admittedly, these are cheap allocations in nursery, but nevertheless. -
initMaybeis less convincing to me, because it's only a constant factor improvement overunsnoc(both for time and allocations). But if we go forlastMaybe, addinginitMaybemakes the API look consistent. -
unsnoc'is unconvincing. Minor improvement in a constant factor, but lots of documentation to write and lots of educational work to explain how to choose betweenunsnocandunsnoc'. People can barely tellfoldlfromfoldl', so I feel that making another prime distinction in a relatively obscure corner will cause more harm than good.
I was waiting for the discussion of alternatives to settle.
FWIW, I believe treeowl and I agree on the desirable strictness and fusion behavior of lastMaybe and initMaybe. I'll summarize:
lastMaybe :: [a] -> Maybe a
lastMaybe xs = toMaybe (foldr (\a r _ -> r (# | a #)) (\m -> m) xs (# (##) | #))
{-# INLINE lastMaybe #-} -- Inline for list fusion
toMaybe :: (# (##) | a #) -> Maybe a
toMaybe (# (##) | #) = Nothing
toMaybe (# | a #) = Just a
initMaybe :: [a] -> Maybe [a]
initMaybe [] = Nothing
initMaybe (a0 : as0) = Just $ build $ \c n ->
let go _a [] = n
go a (b : bs) = a `c` go b bs
in go a0 as0
{-# INLINE initMaybe #-} -- Inline for list fusion
If you raise such a PR you may also get more feedback from GHC developers.
+1, it would be good to know if GHC developers have comments on the implementations.
All three mentioned libraries define (up to syntactic differences and inlining)
lastMaybe xs = if null xs then Nothing else Just (Data.List.last xs). I'd suggest to do the same as the least surprising choice (both semantically and operationally), and similar forinitMaybe: check for emptiness, then delegate the rest of the work to existing functions.
I think base should be held to higher standards than other libraries, especially when it comes to properties like laziness and efficiency which can be tricky to get right. We do not know how much thought went into the definitions in other libraries. Before being added to base, they need to examined and the definitions changed if there is benefit in doing so.
Are there any other opinions on the specific implementation of lastMaybe and initMaybe?
@angerman @parsonsmatt @velveteer @TeofilC @Daniel-Diaz @ChickenProp @noughtmare @mpscholten @doyougnu
I mostly trust container maintainers @treeowl and @meooow25 on such matters. I'm also fine with starting with a simple implementation as suggested by @Bodigrim and doing an iteration sometime later.
it would be good to know if GHC developers have comments on the implementations.
@simonpj @sgraf812 do you possibly have any comments on https://github.com/haskell/core-libraries-committee/issues/316#issuecomment-2763300644?
Looks good to me. It is unclear to me whether INLINE or INLINABLE is best. I have wondered whether initMaybe can be a better consumer, but don't see how. At any rate it seems to be a better producer than init already.
FWIW I do not insist on a simpler implementation. If, as it seems, people have settled on a more advanced implementation, I'm all for it.
@mpilgrem do you need further guidance to finalise your implementation choice?
@hasufell, I'll have a go at an implementation - but it may be a little while hence, as there are some Stack things that is occupying me at present.
@mpilgrem gentle reminder
Thanks! Genuinely not forgotten, but a question of limited bandwidth and having to prioritise various things.
@mpilgrem I'll close this for now, feel free to re-open once you have more bandwidth.