core-libraries-committee icon indicating copy to clipboard operation
core-libraries-committee copied to clipboard

Add a `foldlM'` combinator to "Data.Foldable" that is strict in the accumulator.

Open vdukhovni opened this issue 1 year ago • 44 comments

Presently, "Data.Foldable" provides a foldlM combinator defined as:

foldlM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM f z0 xs = foldr c return xs z0
  -- See Note [List fusion and continuations in 'c']
  where c x k z = f z x >>= k
        {-# INLINE c #-}

Note that this is not strict in the accumulator, and in at least some cases leaks space when compiled with -O, rather than -O2, even when f` is strict. I am proposing the addition of a:

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = foldr c pure xs z0
  -- See Note [List fusion and continuations in 'c']
  where c x k !z = f z x >>= k
        {-# INLINE c #-}

which is strict in the accumulator. This was prompted by an episode of the Haskell Unfolder.

There are perhaps some nits to decide, e.g. whether list-fusion is compatible with eta-reduction by writing instead:

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f = flip (foldr c pure)
  -- See Note [List fusion and continuations in 'c']
  where c x k !z = f z x >>= k
        {-# INLINE c #-}

or instead:

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = \ z0 xs -> foldr c pure xs z0
  -- See Note [List fusion and continuations in 'c']
  where c x k !z = f z x >>= k
        {-# INLINE c #-}

For the record, the "Note" in question reads:

Suppose we define 
    mapM_ f = foldr ((>>) . f) (return ())
(this is the way it used to be).
Now suppose we want to optimise the call
    mapM_ <big> (build g)
      where 
        g c n = ...(c x1 y1)...(c x2 y2)....n...
GHC used to proceed like this:
    mapM_ <big> (build g)
              
    = { Definition of mapM_ }
      foldr ((>>) . <big>) (return ()) (build g)

    = { foldr/build rule }
      g ((>>) . <big>) (return ())

    = { Inline g }
      let c = (>>) . <big>
          n = return ()
      in ...(c x1 y1)...(c x2 y2)....n...
The trouble is that `c`, being big, will not be inlined.  And that can
be absolutely terrible for performance, as we saw in #8763.
It's much better to define
    mapM_ f = foldr c (return ())
      where
        c x k = f x >> k
        {-# INLINE c #-}
Now we get
    mapM_ <big> (build g)

    = { inline mapM_ }
      foldr c (return ()) (build g)
        where c x k = f x >> k
              {-# INLINE c #-}
              f = <big>
Notice that `f` does not inline into the RHS of `c`,
because the INLINE pragma stops it; see
Note [Simplifying inside stable unfoldings] in GHC.Core.Opt.Simplify.Utils.
Continuing:
    = { foldr/build rule }
      g c (return ())
        where ...
           c x k = f x >> k
           {-# INLINE c #-}
              f = <big>

    = { inline g }
      ...(c x1 y1)...(c x2 y2)....n...
        where c x k = f x >> k
              {-# INLINE c #-}
              f = <big>
              n = return ()

        Now, crucially, `c` does inline

    = { inline c }
      ...(f x1 >> y1)...(f x2 >> y2)....n...
        where f = <big>
              n = return ()
And all is well!  The key thing is that the fragment
`(f x1 >> y1)` is inlined into the body of the builder
`g`.

vdukhovni avatar Feb 13 '24 16:02 vdukhovni

foldlM' f z0 xs = foldr c pure xs z0
  where c x k !z = f z x >>= k

Could you explain why the definition in terms of foldr and not a simpler definition in terms of foldlM, say one of the below?

foldlM' f = foldlM (\(!b) -> f b)

foldlM' f = foldlM (\b a -> f b a >>= (pure $!))

tomjaguarpaw avatar Feb 13 '24 16:02 tomjaguarpaw

Could you explain why the definition in terms of foldr and not a simpler definition in terms of foldlM, say one of the below?

It seems you're right, the below seems to work equally well:

foldlM' f = foldlM (\ !acc -> f acc)

as does ensuring that the acc argument to f is manifestly strict.

vdukhovni avatar Feb 13 '24 17:02 vdukhovni

Strictness annotations are great for enforcing the evaluation order even in an Applicative context.

foldMapA :: (Foldable t, Applicative f, Monoid y) => (x -> f y) -> t x -> f y
foldMapA f = foldl' (\ !fy x -> liftA2 (<>) fy (f x)) (pure mempty)

mixphix avatar Feb 13 '24 17:02 mixphix

So, modulo the choice of definition, is there support for adding a foldlM'?

vdukhovni avatar Feb 13 '24 21:02 vdukhovni

A draft merge request indicates a serious commitment to seeing the changes through. A branch with these changes would easily pass most of the GHC CI (in time). Benchmarks would also help your case.

mixphix avatar Feb 14 '24 00:02 mixphix

As demonstrated above, foldlM' can be obtained from foldlM by a correct choice of f. That means that, unlike in the case of foldl, we don't need the strict version. It might be helpful to have it anyway, but it doesn't seem urgent.

FWIW, foldlM is equivalent to foldlM' for any return-strict monad, and I'm inclined to suggest that users should be using return-strict monads by default anyway (even though there aren't many of them in the ecosystem yet).

tomjaguarpaw avatar Feb 14 '24 07:02 tomjaguarpaw

As demonstrated above, foldlM' can be obtained from foldlM by a correct choice of f.

This is not true. Or rather, it is true only for some monads.
Take lazy State or Writer, for instance. The proposed alternatives will show a space leak.

Benchmark
{-# LANGUAGE BangPatterns #-}
import Control.Monad.Trans.State.Lazy
import Data.Foldable (foldlM)
import Test.Tasty.Bench

main :: IO ()
main = defaultMain
  [ env (pure xs_) $ \xs ->
    bgroup "bench"
    [ bench "foldlM" $ whnf (benchFoldlM xs) foldlM
    , bench "foldlM_1" $ whnf (benchFoldlM xs) foldlM_1
    , bench "foldlM_2" $ whnf (benchFoldlM xs) foldlM_2
    , bench "foldlM'" $ whnf (benchFoldlM xs) foldlM'
    ]
  ]
  where
    xs_ :: [Int]
    xs_ = [1..100000]

benchFoldlM
  :: [Int]
  -> ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int)
  -> Int
benchFoldlM xs foldlM = flip evalState () $ foldlM (\a b -> pure (a+b)) 0 xs

foldlM_1 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_1 f = foldlM (\ !x -> f x)

foldlM_2 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_2 f = foldlM (\b a -> f b a >>= (pure $!))

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = foldr c pure xs z0
  where c x k !z = f z x >>= k
        {-# INLINE c #-}
All
  bench
    foldlM:   OK
      30.5 ms ± 2.2 ms,  29 MB allocated,  36 MB copied,  52 MB peak memory
    foldlM_1: OK
      35.8 ms ± 2.9 ms,  33 MB allocated,  38 MB copied,  52 MB peak memory
    foldlM_2: OK
      30.4 ms ± 2.9 ms,  29 MB allocated,  37 MB copied,  52 MB peak memory
    foldlM':  OK
      4.90 ms ± 456 μs,  25 MB allocated, 1.5 KB copied,  52 MB peak memory

meooow25 avatar Feb 14 '24 19:02 meooow25

@tomjaguarpaw What is a "return-strict monad"? Making pure strict in (almost any) Monad violates the monad identity laws, badly breaking reasoning.

treeowl avatar Feb 14 '24 20:02 treeowl

@meooow25 This is very interesting, thank you! For the purpose of avoiding space leaks, the proposed foldM is as good as

foldMS _ z [] = pure z
foldMS f z (x:xs) = do
  !z' <- f z x
  foldMS f z' xs

My mistake was assuming that this was equivalent to

foldMS2 _ z [] = pure z
foldMS2 f z (x:xs) = do
  z' <- f z x
  pure $! z'
  foldMS2 f z' xs

In a lazy monad the equivalence doesn't hold, because undefined >>= pure x == pure x. I find lazy monads extremely hard to understand and I'm doubtful whether we should be encouraging their use, so I don't see this proposal as particular well-motivated from the point of view of avoiding space leaks. In terms of raw performance, my experiments with your benchmark code suggest that foldlM_2 is as good as foldlM' for strict State and IO (foldlM_1 is worse, but I don't understand why).

@treeowl See the Discourse post I linked for an explanation. I would appreciate it if you can give some demonstrations of how violation of the pure law badly breaks reasoning in practice, because no one was able to come up with one in that discussion.

tomjaguarpaw avatar Feb 14 '24 23:02 tomjaguarpaw

I think the monadFix example in that discussion was very clear, and also definitive.

The fact that you don not use or like MonadFix hardly matters -- its a very real use case, and the reasoning to arrive at a given MonadFix instance is exactly the sort of reasoning that breaks.

gbaz avatar Feb 14 '24 23:02 gbaz

@meooow25, here's my entry for "can we do it with just foldlM?". It doesn't leak, but it's not as fast as the custom strict fold, and I'm not optimistic it can be made to perform as well in general.

-- | Turn a "lazy monad" like lazy State into a strict one.
newtype Strictly m a = Strictly
  { unStrictly :: m (Solo a) }

instance MonadTrans Strictly where
  lift = Strictly . fmap MkSolo

-- Run a `Strictly` computation. Forcing the result(s) forces all intermediate computations.
runStrictly :: Monad m => Strictly m a -> m a
runStrictly (Strictly m) =
  m >>= \(MkSolo a) -> pure a

instance Monad m => Functor (Strictly m) where
  fmap = liftM
  x <$ m = m >> pure x

instance Monad m => Applicative (Strictly m) where
  pure = Strictly . pure . MkSolo
  liftA2 = liftM2
  m *> n = m >>= \_ -> n

instance Monad m => Monad (Strictly m) where
  Strictly m >>= f = Strictly $
    m >>= \(MkSolo a) -> unStrictly (f a)

foldlM_3 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_3 f b = runStrictly . foldlM (\ !x -> lift . f x) b

Here are the benchmarks:

All
  bench
    foldlM:   OK
      51.7 ms ± 1.3 ms,  29 MB allocated,  36 MB copied,  52 MB peak memory
    foldlM_1: OK
      63.4 ms ± 2.3 ms,  34 MB allocated,  39 MB copied,  52 MB peak memory
    foldlM_2: OK
      52.0 ms ± 4.2 ms,  28 MB allocated,  35 MB copied,  52 MB peak memory
    foldlM_3: OK
      12.2 ms ± 824 μs,  34 MB allocated, 2.6 KB copied,  52 MB peak memory
    foldlM':  OK
      9.66 ms ± 693 μs,  25 MB allocated, 1.5 KB copied,  52 MB peak memory

@tomjaguarpaw , I really don't like having to think about what functor/applicative/monad I'm working in when using the laws to transform code. It's a lot of extra mental overhead. That applies equally to your "return-strict monads" and to such monstrosities as lazy Writer and lazy State.

treeowl avatar Feb 14 '24 23:02 treeowl

BTW, Codensity from the kan-extensions package seems to work as well as my Strictly monad transformer in the context of these benchmarks, but that seems a bit over the top.

treeowl avatar Feb 15 '24 04:02 treeowl

here's my entry for "can we do it with just foldlM?".

Fantastic idea!

EDIT: My version with IdentityT was benchmarking the wrong thing.

~~In fact, if I'm benchmarking it right, the same approach with IdentityT recovers the performance of the custom strict fold.~~

    foldlM_3I: OK
      31.8 ms ± 1.7 ms, 175 MB allocated, 7.4 KB copied,  84 MB peak memory
    foldlM':   OK
      33.6 ms ± 1.6 ms, 175 MB allocated, 6.9 KB copied, 163 MB peak memory
    foldlM_3:  OK
      45.8 ms ± 1.8 ms, 266 MB allocated,  12 KB copied, 245 MB peak memory
foldlM_3I :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_3I f b = runIdentityT . foldlM (\ !x -> lift . f x) b

~~I'd be grateful if someone can attempt to replicate this to make sure I haven't made a silly mistake.~~

Yeah, it was a silly mistake :(


Off-topic (please make replies on the return strict monad Discourse thread).

I really don't like having to think about what functor/applicative/monad I'm working in when using the laws to transform code

Absolutely. Additionally, I don't like having to think about space leaks introduced by failing to make invalid laziness unrepresentable. There's a trade off here and I think it's interesting to consider whether it's possible weaken one of the monad laws to f >=> pure == (f $!) and receive outsized benefits in return. I haven't yet seen anything that convinces me that approach is futile. If you are convinced then I would appreciate it if you can share the evidence that convinces you.

I think the monadFix example in that discussion was very clear, and also definitive.

Subsequent discussion on the thread demonstrated that MonadFix is still available in return strict monads.

I would welcome more input on the topic of "return strict" monads, particularly dissenting opinions with concrete examples of things that go wrong when using a "return strict" monad. Please keep the discussion on the return strict Discourse thread though.

tomjaguarpaw avatar Feb 15 '24 08:02 tomjaguarpaw

I haven't tried with IdentityT, but as far as I can tell, the only way it could possibly do something here is by subtly tweaking the optimizer (most likely through arity changes). The dot in lift . f x might be doing more than its share of work. IdentityT is absolutely the blandest monad transformer imaginable.

treeowl avatar Feb 15 '24 09:02 treeowl

You're right. I was benchmarking the wrong thing!

tomjaguarpaw avatar Feb 15 '24 10:02 tomjaguarpaw

As a practical matter, why would you be doing a strict monadic fold in a lazy monad? It seems generally better to do that in the corresponding "strict" monad and convert the resulting action to the lazy form. That's why I haven't made a package for something like the above Strictly—it seems like it would need to go in the acme category.

treeowl avatar Feb 15 '24 10:02 treeowl

Here's a benchmark I believe to be correct. It shows that a definition of monad Strictly2 is sufficient to get results that are twice as fast as the custom strict fold! I'm actually having trouble understanding how this works, but it does seem to.

Again, I'd appreciate it if someone could double-check my work because it's easy to introduce small errors that have big effects.

    foldlM_Strictly2: OK
      3.50 ms ± 173 μs,  18 MB allocated, 999 B  copied,  14 MB peak memory
    foldlM':          OK
      7.60 ms ± 281 μs,  25 MB allocated, 1.5 KB copied,  22 MB peak memory
    foldlM_Strictly:  OK
      9.02 ms ± 688 μs,  34 MB allocated, 2.4 KB copied,  29 MB peak memory
    foldlM_1S:        OK
      56.7 ms ± 2.9 ms,  34 MB allocated,  42 MB copied,  65 MB peak memory
    foldlM_2S:        OK
      60.7 ms ± 2.9 ms,  33 MB allocated,  46 MB copied,  98 MB peak memory

The operative definition is the strict bind.

instance Monad m => Monad (Strictly2 m) where
  Strictly2 m >>= f = Strictly2 $
    m >>= \ !a -> unStrictly2 (f a)

N.B. Strictly2 violates the monad law pure >=> f == f. Instead pure >=> f == (f $!), so it is a "return strict" monad.

Code

#!/usr/bin/env cabal
{- cabal:
build-depends: base, transformers, tasty-bench, mtl
-}
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O1 #-}
import Control.Monad.Trans.State.Lazy
import Control.Monad.Trans
import Data.Foldable (foldlM, toList)
import Test.Tasty.Bench
import Data.Tuple
import Control.Applicative
import Control.Monad

main :: IO ()
main = defaultMain
  [ env (pure 100000) $ \n ->
    bgroup "bench"
    [ bench "foldlM_Strictly2" $ whnf (benchFoldlM (xs n)) foldlM_Strictly2
    , bench "foldlM'" $ whnf (benchFoldlM (xs n)) foldlM'
    , bench "foldlM_Strictly" $ whnf (benchFoldlM (xs n)) foldlM_Strictly
    , bench "foldlM_1S" $ whnf (benchFoldlM (xs n)) foldlM_1
    , bench "foldlM_2S" $ whnf (benchFoldlM (xs n)) foldlM_2
    ]
  ]
  where
    xs :: Int -> [Int]
    xs n = [1..n]

benchFoldlM
  :: [Int]
  -> ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int)
  -> Int
benchFoldlM xs foldlM = flip evalState () $ foldlM (\a b -> pure (a+b)) 0 xs

foldlM_1 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_1 f b = foldlM (\ !x -> f x) b . toList

foldlM_2 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_2 f = foldlM (\b a -> f b a >>= (pure $!))

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = foldr c pure xs z0
  where c x k !z = f z x >>= k

-- | Turn a "lazy monad" like lazy State into a strict one.
newtype Strictly m a = Strictly
  { unStrictly :: m (Solo a) }

instance MonadTrans Strictly where
  lift = Strictly . fmap Solo

-- Run a `Strictly` computation. Forcing the result(s) forces all
-- intermediate computations.
runStrictly :: Monad m => Strictly m a -> m a
runStrictly (Strictly m) =
  m >>= \(Solo a) -> pure a

instance Monad m => Functor (Strictly m) where
  fmap = liftM
  x <$ m = m >> pure x

instance Monad m => Applicative (Strictly m) where
  pure = Strictly . pure . Solo
  liftA2 = liftM2
  m *> n = m >>= \_ -> n

instance Monad m => Monad (Strictly m) where
  Strictly m >>= f = Strictly $
    m >>= \(Solo a) -> unStrictly (f a)

foldlM_Strictly :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_Strictly f b = runStrictly . foldlM (\ !x -> lift . f x) b

-- | Turn a "lazy monad" like lazy State into a strict one.
newtype Strictly2 m a = Strictly2
  { unStrictly2 :: m a }

instance MonadTrans Strictly2 where
  lift = Strictly2

-- Run a `Strictly2` computation. Forcing the result(s) forces all
-- intermediate computations.
runStrictly2 :: Monad m => Strictly2 m a -> m a
runStrictly2 (Strictly2 m) = m

instance Monad m => Functor (Strictly2 m) where
  fmap = liftM

instance Monad m => Applicative (Strictly2 m) where
  pure = Strictly2 . pure
  liftA2 = liftM2

instance Monad m => Monad (Strictly2 m) where
  Strictly2 m >>= f = Strictly2 $
    m >>= \ !a -> unStrictly2 (f a)

foldlM_Strictly2 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_Strictly2 f b = runStrictly2 . foldlM (\ x -> lift . f x) b

tomjaguarpaw avatar Feb 15 '24 10:02 tomjaguarpaw

As for the benchmarks of foldMapA:

All
  bench
    foldMapA: OK
      6.29 ns ± 508 ps
    foldlM:   OK
      19.2 ms ± 1.6 ms
    foldlM_1: OK
      24.0 ms ± 2.3 ms
    foldlM_2: OK
      19.7 ms ± 1.9 ms
    foldlM':  OK
      3.63 ms ± 251 μs
import Control.Monad.State
import Data.Foldable qualified as Fold
import Data.Monoid
import Test.Tasty.Bench

main :: IO ()
main =
  defaultMain
    [ env (pure @IO @Int 100_000) \n ->
        bgroup "bench" $
          bench "foldMapA" (whnf example n)
            : fmap
              (\(name, f) -> bench name $ whnf (benchFoldlM n) f)
              [ ("foldlM", Fold.foldlM)
              , ("foldlM_1", foldlM_1)
              , ("foldlM_2", foldlM_2)
              , ("foldlM'", foldlM')
              ]
    ]

example :: Int -> State () (Sum Int)
example i = foldMapA (pure . Sum) [1 .. i] where
  foldMapA :: (Foldable t, Applicative f, Monoid m) => (x -> f m) -> t x -> f m
  foldMapA f = Fold.foldl' (\ !fm x -> liftA2 (<>) fm (f x)) (pure mempty)

benchFoldlM ::
  Int ->
  ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int) ->
  Int
benchFoldlM i __ =
  flip evalState () $ __ (\a b -> pure (a + b)) 0 [1 .. i]

foldlM_1 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_1 f b = Fold.foldlM (\ !x -> f x) b . Fold.toList

foldlM_2 :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_2 f = Fold.foldlM (\b a -> f b a >>= (pure $!))

foldlM' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM' f z0 xs = Fold.foldr c pure xs z0
 where
  c x k !z = f z x >>= k

mixphix avatar Feb 15 '24 15:02 mixphix

The foldA benchmark isn't actually running the state computation. You need an evalState in example.

tomjaguarpaw avatar Feb 15 '24 15:02 tomjaguarpaw

@treeowl nice, that was a fun exercise to figure out how Strictly works.

@tomjaguarpaw regarding Strictly2, it indeed seems to perform better. Your code is equivalent to

foldlM_likeStrictly2' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_likeStrictly2' f z0 xs = foldr c pure xs z0
 where
  c x k z = f z x >>= \ !x -> k x -- this is >>= of Strictly2

I see similar results if we change the proposed foldlM' to

foldlM_finalStrict' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_finalStrict' f z0 xs = foldr c (pure $!) xs z0
                                         -- ^ strict here too
  where c x k !z = f z x >>= k

In fact I was going to suggest that we make the proposed function strict in the final value.

Also note that foldlM_likeStrictly2', unlike foldlM_finalStrict', is not strict in the initial value z0 (i.e. it is passed to f without being forced). But this can be changed with a bang on z0, for instance.

foldlM_likeStrictly2InitialStrict' :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM_likeStrictly2InitialStrict' f !z0 xs = foldr c pure xs z0
 where
  c x k z = f z x >>= (k $!)

@vdukhovni I suggest that the proposed function be updated to be strict in the final value, as in foldlM_finalStrict' (or alternately foldlM_likeStrictly2InitialStrict'). This would make it match the function shown in the Haskell Unfolder episode you mentioned (named repeatedlyM1 there).

Also, forgot to say this before, as a user I am +1 on seeing foldlM' in Data.Foldable.


Aside on benchmarking

The purpose of env foo $ \fooData -> ... is to make sure foo is evaluated to NF before the benchmark is run. Please don't move it inside the benchmark, otherwise you're also measuring the time it takes to generate foo. Depending on how it's used, it might also get shared between runs or benchmarks, making things even more uncertain.

meooow25 avatar Feb 15 '24 17:02 meooow25

The foldA benchmark isn't actually running the state computation. You need an evalState in example.

You're so right! Let's fix that!

main :: IO ()
main =
  defaultMain
    [ env (pure @IO @Int 100_000) \n ->
        bgroup "bench" do
          fmap
            (\(name, f) -> bench name $ whnf (either ($ n) $ benchFoldlM n) f)
            [ ("foldMapA", Left example)
            , ("foldlM", Right Fold.foldlM)
            , ("foldlM_1", Right foldlM_1)
            , ("foldlM_2", Right foldlM_2)
            , ("foldlM'", Right foldlM')
            ]
    ]

example :: Int -> Int
example i = getSum $ evalState (foldMapA (pure . Sum) [1 .. i]) ()
 where
  foldMapA f = Fold.foldl' (\ !fm x -> liftA2 (<>) fm (f x)) (pure mempty)
  bench
    foldMapA: OK
      856  μs ±  53 μs
    foldlM:   OK
      19.1 ms ± 1.8 ms
    foldlM_1: OK
      24.0 ms ± 2.2 ms
    foldlM_2: OK
      19.6 ms ± 1.9 ms
    foldlM':  OK
      3.63 ms ± 248 μs

mixphix avatar Feb 15 '24 18:02 mixphix

Speaking of accidentally benchmarking the wrong thing, I was wrong about Codensity; it actually does better than Strictly, and very nearly as well as the custom strict fold!

treeowl avatar Feb 15 '24 21:02 treeowl

@mixphix Your example isn't comparable. You're folding over a list that will end up fusing away, and not being realized in memory. That said, you're also theoretically building a huge State computation, but ... I'm betting GHC is being really clever about your left fold in a way that makes it basically throw away the whole thing. I'll need to look at some Core to understand better. Your example definitely suggests that we should be more careful here, and in particular should use the state sometimes—or at least pretend to.

treeowl avatar Feb 15 '24 21:02 treeowl

You're folding over a list

benchFoldlM
  :: [Int]
  -> ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int)
  -> Int
benchFoldlM xs foldlM = flip evalState () $ foldlM (\a b -> pure (a+b)) 0 xs

mixphix avatar Feb 16 '24 04:02 mixphix

You're folding over a list

benchFoldlM
  :: [Int]
  -> ((Int -> Int -> State () Int) -> Int -> [Int] -> State () Int)
  -> Int
benchFoldlM xs foldlM = flip evalState () $ foldlM (\a b -> pure (a+b)) 0 xs

I'm sorry, but I have no idea what any of this comment means. Could you please expand on that?

treeowl avatar Feb 16 '24 04:02 treeowl

Thanks to @meooow25 and @treeowl I understand better what's going on here, and subsequently I have more confidence that my original alternative is the correct implementation.

As pointed out by @treeowl, Strictly turns a lazy monad into a strict one. Strictly2 goes a step further and turns

foldMS _ z [] = pure z
foldMS f z (x:xs) = do
  z' <- f z x
  foldMS f z' xs

into

foldMS _ z [] = pure z
foldMS f z (x:xs) = do
  !z' <- f z x
  foldMS f z' xs

But is this really want you want in a lazy monad? @meooow25 pointed out, and I was surprised, that my original alternative still leaks space for lazy monads. But now I realise that's the point! The point of a lazy monad is to "trade space for time" in a certain sense. In the lazy state monad, if you have a very expensive integer state computation, followed by put 0, then the expensive computation will not be performed (in exchange for this benefit you pay by keeping the whole integer state computation around until evaluation time).

The foldlM' of this proposal is incorrect from that point of view. If the user is using a lazy monad then the whole point is to make this tradeoff![^1] If we're going to add a foldlM' it should be one that doesn't subvert the user's choice in this regard.

[^1]: I don't actually recommend ever using lazy monads for practical code.

tomjaguarpaw avatar Feb 16 '24 08:02 tomjaguarpaw

@vdukhovni do you wish to proceed with the CLC proposal?

mixphix avatar Feb 17 '24 12:02 mixphix

@vdukhovni how would you like to proceed with the proposal? If there are no updates in a month, I'll close as abandoned.

Bodigrim avatar Apr 21 '24 14:04 Bodigrim

Having read all the comments, I'm not sure what to make of them.

On the one hand, from https://github.com/haskell/core-libraries-committee/issues/255#issuecomment-1944470658 it seems that the proposed foldlM'is advantageous in some situations (outperforms alternatives).

On the other hand, there are perhaps some objections I have failed to appreciate. Are the objections about how to define foldlM' or whether to do it at all???

My instinct is to try to help a typical proficient, but non-expert user, without breaking anything, and or leading to surprising behaviour. So I still think a foldlM' could be useful, but I can't tell whether there's either a consensus way forward, or a consensus objection in the above comments.

vdukhovni avatar Apr 21 '24 15:04 vdukhovni

I can't tell whether there's either a consensus way forward, or a consensus objection in the above comments.

There is no need to build a consensus. A simple majority is enough.

Bodigrim avatar Apr 22 '24 20:04 Bodigrim