foldl icon indicating copy to clipboard operation
foldl copied to clipboard

On constant space

Open meooow25 opened this issue 1 year ago • 4 comments

fold and foldM are defined as

https://github.com/Gabriella439/foldl/blob/473a5f002db29d78d3100930ce043158c5569a50/src/Control/Foldl.hs#L528-L544

But the description also claims that

This library provides strict left folds that stream in constant memory...

Seems to be a little misleading to me. Whether the fold is in constant memory depends greatly on whether GHC is able to optimize foldr for f into a constant-space fold, and it is not a guarantee arising out of this library. If GHC fails here, for whatever reason, it is likely to require linear memory.

For fold, I think this could be avoided by using foldl'. It is a safer bet that the implementor of Foldable f has written a constant-space foldl'. Is there a reason why foldr is used instead of foldl'?

For foldM, I don't know what can be done.


And to demonstrate the problem, we can benchmark on Data.Set. Tree-like structures often define foldr in a way that that GHC cannot optimize.

  sum 100000
    Control.Foldl.fold:    OK
      1.72 ms ±  54 μs, 6.9 MB allocated, 186 KB copied,  22 MB peak memory
    fold but using foldl': OK
      325  μs ±  13 μs,   0 B  allocated,   0 B  copied,  22 MB peak memory
Benchmark source
-- Using GHC 9.6.3, -O
import qualified Control.Foldl as Fold
import Data.Foldable (foldl')
import Data.Set (Set)
import qualified Data.Set as Set
import Test.Tasty.Bench

main :: IO ()
main = defaultMain
  [ env (pure xs_) $ \xs -> bgroup "sum 100000"
    [ bench "Control.Foldl.fold" $ nf (Fold.fold Fold.sum) xs
    , bench "fold but using foldl'" $ nf (fold' Fold.sum) xs
    ]
  ]
  where
    xs_ = Set.fromList [1..100000] :: Set Int

fold' :: Foldable f => Fold.Fold a b -> f a -> b
fold' (Fold.Fold step start done) xs = done (foldl' step start xs)
{-# INLINE fold' #-}

meooow25 avatar Jul 03 '24 12:07 meooow25

Thinking about it more, I realized that there is a more fundamental limitation than GHC optimations: the structure itself. A Set cannot be folded over without requiring space proportional to the depth of the tree, which is $O(\log n)$. The above benchmarks only show that foldl' does not allocate on the heap, but it still uses $O(\log n)$ stack space.

To reiterate, I want to draw attention to the facts that:

  1. The foldl library cannot make accurate claims about the space required to apply a Fold. It depends on the structure to be folded over, the implementation of foldr/foldl' for that structure, and the GHC optimizations at play. Even at their most efficient, the same fold will have different requirements: Fold.fold Fold.sum (xs :: [Int]) is $O(1)$ space and Fold.fold Fold.sum (xs :: Set Int) is $O(\log n)$ space.
  2. For some structures (like Set as demonstrated), the implementation of foldl' is practically more efficient than obtaining a left fold via foldr. At the same time, I cannot think of any situation where the opposite would be true. So I believe foldl' would be a better delegate for Fold.fold rather than foldr.

meooow25 avatar Jul 14 '24 04:07 meooow25

Yeah, it should probably clarify that it's only guaranteed to be constant space for lists.

Also, if I'm being really pedantic, it's only guaranteed to be constant space if all folds involved (including user-defined Folds) use an accumulator that is internally strict. This property holds for the existing Folds provided by the foldl package but might not necessarily hold for user-defined Folds.

Gabriella439 avatar Aug 21 '24 03:08 Gabriella439

That's true, and clearer documentation would be great. And what do you think of switching to foldl'?

meooow25 avatar Nov 10 '24 09:11 meooow25

Oh yeah, I would be okay with switching to foldl'

Gabriella439 avatar Dec 20 '24 03:12 Gabriella439