core-libraries-committee icon indicating copy to clipboard operation
core-libraries-committee copied to clipboard

Add Data.List[.NonEmpty].unfoldM

Open 414owen opened this issue 1 year ago • 24 comments

Data.List exposes foldM, but not unfoldM. unfoldM can be useful when producing a list whose size depends on a monadic context.

The only viable alternative I can think of to produce these kinds of lists, without depending on other libraries, is explicit recursion.

Other libraries do export this function: hoogle search, albeit with some variation in signatures.

I think it would be reasonable to add unfoldM to Data.List, (and, for consistency, Data.List.NonEmpty).

I do not think this is adding undue mental burden on the user, as its name is consistent (unfold with a Monadic context, as is the naming scheme for many functions).

Here is the implementation of Data.List.unfoldM I think would be reasonable:

unfoldM :: forall a seed m. Monad m => (seed -> m (Maybe (a, seed))) -> seed -> m [a]
unfoldM f initialSeed = f initialSeed >>= step
  where
    step :: Maybe (a, seed) -> m [a]
    step mTup = case mTup of
      Nothing -> pure []
      Just (el, seed) -> (el :) <$> unfoldM f seed

And here's a version I think would be a reasonable addition to Data.List.NonEmpty:

{-# LANGUAGE ScopedTypeVariables #-}

unfoldM :: forall a seed m. Monad m => (seed -> m (a, Maybe seed)) -> seed -> m (NonEmpty a)
unfoldM f initialSeed = do
  (x, mSeed) <- f initialSeed
  (x :|) <$> g mSeed
  where
    g :: Maybe seed -> m [a]
    g mSeed = case mSeed of
      Nothing -> pure []
      Just seed -> f seed >>= h

    h :: (a, Maybe seed) -> m [a]
    h (x, mSeed) = (x :) <$> g mSeed

Open questions:

  • Is there some kind of fusion possible? (I suspect not)
  • Is there a more general form?[^1] (I suspect not)
  • Do we need/want l and r variants?
  • Do we need/want strict/lazy variants?

[^1]: This gets into the rabbit hole of whether the Foldable typeclass should have a dual...

414owen avatar May 28 '24 11:05 414owen

Have you encountered ListT?

newtype ListT m a
  = ListT (m (Maybe (a, ListT m a)))

It even offers unfoldM :: Monad m => (b -> m (Maybe (a, b))) -> b -> ListT m a. It's not really a question of fusion since the monadic effect must occur at each step, it would be more useful for streaming applications.


If you're interested in the duals of Foldable and Traversable, check out distributive. And here's how to add a dependency to your cabal file.

mixphix avatar May 28 '24 19:05 mixphix

@mixphix yes, I've encountered and used ListT.unfoldM. It feels like base has something missing, with foldr and unfoldr, and foldM but no dual.

414owen avatar May 28 '24 20:05 414owen

Impact assessments are appreciated, though I myself have not been able to successfully build a working version of clc-stackage in some time. For a function with Hackage hits in heavy-duty packages it would be wise to provide maintainers with some notice if it is to be added to base. Just because base doesn't have something doesn't mean it's missing: what programs will base users write using unfoldM, that they cannot understand how to define unfoldM themselves, or import it from some toolkit designed to wield the abstraction?

mixphix avatar May 28 '24 21:05 mixphix

@mixphix to some extent I agree with you, but the vast majority of functions in (say) Data.List are user-definable, yet we, as a community, have decided that some combination of sharing, standardization, and not reinventing the wheel are valuable reasons to include functions in base.

Of course people can write their own unfoldM, but then we end up with quite a lot of unnecessary unfoldMs in the wild (or more likely, more explicit recursion, and less shared vocabulary...). I don't think this is desirable.

414owen avatar May 28 '24 21:05 414owen

Would it also be possible with a version in Data.Foldable1?

ysangkok avatar Jun 15 '24 16:06 ysangkok

Is there a more general form?

Functionality-wise, yes, a stream.

data Stream m a r = Step a (Stream m a r)
                  | Effect (m (Stream m a r))
                  | Return r

unfold :: Monad m => (seed -> m (Maybe (a, seed))) -> seed -> Stream m a ()
unfold f = step
  where
    step x =
      Effect $ do
        m <- f x
        case m of
          Just (a, x') -> pure $ Step a (go x')
          Nothing      -> pure $ Return ()

toList :: Monad m => Stream m a () -> m [a]
toList (Step a s) = (:) a <$> toList s
toList (Effect m) = m >>= toList
toList (Return r) = pure []

Notice how unfold does nothing here: Stream constructors can be used directly to do the exact same thing.

BurningWitness avatar Jun 15 '24 17:06 BurningWitness

Impact assessments are appreciated, though I myself have not been able to successfully build a working version of clc-stackage in some time. For a function with Hackage hits in heavy-duty packages it would be wise to provide maintainers with some notice if it is to be added to base. Just because base doesn't have something doesn't mean it's missing: what programs will base users write using unfoldM, that they cannot understand how to define unfoldM themselves, or import it from some toolkit designed to wield the abstraction?

If your concern is naming conflicts, it's worth pointing out that code which imports Data.List or Data.List.NonEmpty unqualified is pretty much broken already. There is some value in preserving the code that accidentally works because it gets lucky avoiding all the other name conflict landmines when doing this, but not a lot of value.

cdsmith avatar Jun 15 '24 17:06 cdsmith

As noticed by monad-loops

(seed -> m (Maybe (a, seed))) -> seed -> m [a]

can be expressed via

m' (Maybe a) -> m' [a] 

for m' ~ StateT seed m. I wonder if the latter would be a better addition to base.

Bodigrim avatar Jun 15 '24 17:06 Bodigrim

@ysangkok I don't think Foldable1 is the place for unfoldM. Unfolds cannot be expressed as folds (okay, if they can, I wish to remain ignorant...), and we're implementing concrete unfolds, which return non-empty/lists.

@Bodigrim I'm afraid I don't follow the logic of using StateT seed m. As far as I'm aware base doesn't expose any monad transformers. Isn't there also the issue of stacking multiple StateTs? I don't think it's cleaner anyway to be honest. Having to get and put the seed will end up adding boilerplate.

414owen avatar Jun 15 '24 18:06 414owen

As far as I'm aware base doesn't expose any monad transformers.

The suggestion wasn't to introduce monad transformers into base. It was to implement something like:

generateList :: Monad m => m (Maybe a) -> m [a]
generateList m = go
  where
    go = m >>= \case
        Nothing -> pure []
        Just x -> (x:) <$> go

You could trivially use this with StateT to recover the original desired behavior, but it could also be used with other monads, such sampling a probability distribution where "end of stream" can be one of the possible outcomes.

The NonEmpty variant is less appealing: I suppose the argument is something like m (a, Bool).

cdsmith avatar Jun 15 '24 19:06 cdsmith

Thanks @cdsmith, yes, generateList is what I meant. With unfoldM we would certainly like to have both lazy and strict variants, but with generateList a user can just plug lazy or strict StateT.

Bodigrim avatar Jun 15 '24 20:06 Bodigrim

Could the NonEmpty version of generateList be a -> m (Maybe a) -> m (NonEmpty a)?

TikhonJelvis avatar Jun 15 '24 20:06 TikhonJelvis

Could the NonEmpty version of generateList be a -> m (Maybe a) -> m (NonEmpty a)?

That seems wrong to me, since you very well might need to use functionality of the monad to decide on the first element, too. Perhaps m a -> m (Maybe a) -> m (NonEmpty a) is better. But it seems inconvenient in practice for examples I can think of where one wants to monadically generate a non-empty list. They all seem to fall into the category where I know immediately after each element whether it's the last one or not. I don't want to have to stash that in state somewhere just so I can remember to return Nothing on the next polling action.

cdsmith avatar Jun 15 '24 20:06 cdsmith

There seems to be no prior art for NonEmpty variation. There are plenty of potentially useful helpers, especially when it comes to monadic variants, so it is difficult to judge their utility unless someone has had cared enough at least to implement them in a standalone library.

Bodigrim avatar Jun 16 '24 16:06 Bodigrim

@414owen how would you like to move forward? Did you receive enough feedback to refine and come up with the final proposal?

Bodigrim avatar Jun 29 '24 10:06 Bodigrim

@cdsmith re: a -> m (Maybe a) -> m (NonEmpty a)

That seems wrong to me, since you very well might need to use functionality of the monad to decide on the first element, too. Perhaps m a -> m (Maybe a) -> m (NonEmpty a) is better

The caller still has access to that monadic context outside of the call to generateList, and can construct the initial element using it. For example...

do
  initial <- get
  NE.generateList initial stepFn

But on the other hand, if that's the case, why would we even include a nonempty version of this function? It's almost as easy to do this:

do
  initial <- get
  (initial :|) <$> generateList stepFn

414owen avatar Jul 03 '24 10:07 414owen

One reservation I have is that this is a new type of pattern. I wanted to introduce the dual of foldM, but generateList isn't quite it. In fact, you could argue we'd just be adding a new function which itself lacks a dual in base.

The dual of generateList would presumably be something like this:

destroyList :: (a -> m b) -> [a] -> m b

Consistency is quite important. I'm not opposed to this new pattern. It seems reasonable, but if we add it, we'd end up with two missing duals, rather than the one we currently have.

Properly filling out the missing pieces with unfoldM is annoying, as it requires strict/lazy, list/nonempty versions, and adding generateList is certainly an improvement in that sense...

I'm a bit conflicted to be honest.

414owen avatar Jul 03 '24 10:07 414owen

Please do include documentation in your merge request, if you choose to create one.

mixphix avatar Jul 03 '24 11:07 mixphix

The dual of generateList would presumably be something like this:

I'm quite bad at this sort of stuff, but I'm not sure if there is any dual. I imagine the usual derivation is that foldr :: (a -> b -> b) -> b -> [a] -> b is equivalent to (Maybe (a, b) -> b) -> ([a] -> b), which after reversing inner arrows gives us unfoldr :: (b -> Maybe (a, b)) -> (b -> [a]). But how do you apply this process to generateList :: m (Maybe a) -> m [a]?..

Bodigrim avatar Jul 03 '24 18:07 Bodigrim

But how do you apply this process to generateList :: m (Maybe a) -> m [a]?..

Write it as (() -> m (Maybe a) -> (() -> m [a]), then reverse the (Kleisli) arrows and you get (Maybe a -> m ()) -> [a] -> m () or equivalently:

destroyList :: (a -> m ()) -> m () -> [a] -> m ()

sjoerdvisscher avatar Jul 04 '24 15:07 sjoerdvisscher

Fair enough, but in such case (a -> m ()) -> m () -> [a] -> m () is not much different from traverse_.

Bodigrim avatar Jul 04 '24 18:07 Bodigrim

in such case (a -> m ()) -> m () -> [a] -> m () is not much different from traverse_

Yes, I think that's the point. traverse_ is equivalent to for_ is equivalent to foldr. generateList :: Monad m => m (Maybe a) -> m [a] is their dual (despite having to shuffle some arguments around to see that).

tomjaguarpaw avatar Jul 05 '24 10:07 tomjaguarpaw

It sounds we agree to the dual pairings of (foldM/unfoldM) and (traverse_/generateList).

It's unfortunate from an ergonomic perspective that traverse_ and generateList have unrelated names, that don't suggest they are each others' duals. This hurts discoverability, and consistency.

414owen avatar Jul 17 '24 09:07 414owen

@414owen if you made up your mind, please prepare a GHC MR with the final proposal.

Bodigrim avatar Jul 21 '24 17:07 Bodigrim

@414owen if there is no progress by the end of August, I'll close the proposal as abandoned.

Bodigrim avatar Aug 04 '24 19:08 Bodigrim

Closing as abandoned, feel free to reopen when there are resources to continue.

Bodigrim avatar Sep 01 '24 18:09 Bodigrim