foldl
foldl copied to clipboard
On constant space
fold and foldM are defined as
https://github.com/Gabriella439/foldl/blob/473a5f002db29d78d3100930ce043158c5569a50/src/Control/Foldl.hs#L528-L544
But the description also claims that
This library provides strict left folds that stream in constant memory...
Seems to be a little misleading to me. Whether the fold is in constant memory depends greatly on whether GHC is able to optimize foldr for f into a constant-space fold, and it is not a guarantee arising out of this library. If GHC fails here, for whatever reason, it is likely to require linear memory.
For fold, I think this could be avoided by using foldl'. It is a safer bet that the implementor of Foldable f has written a constant-space foldl'. Is there a reason why foldr is used instead of foldl'?
For foldM, I don't know what can be done.
And to demonstrate the problem, we can benchmark on Data.Set. Tree-like structures often define foldr in a way that that GHC cannot optimize.
sum 100000
Control.Foldl.fold: OK
1.72 ms ± 54 μs, 6.9 MB allocated, 186 KB copied, 22 MB peak memory
fold but using foldl': OK
325 μs ± 13 μs, 0 B allocated, 0 B copied, 22 MB peak memory
Benchmark source
-- Using GHC 9.6.3, -O
import qualified Control.Foldl as Fold
import Data.Foldable (foldl')
import Data.Set (Set)
import qualified Data.Set as Set
import Test.Tasty.Bench
main :: IO ()
main = defaultMain
[ env (pure xs_) $ \xs -> bgroup "sum 100000"
[ bench "Control.Foldl.fold" $ nf (Fold.fold Fold.sum) xs
, bench "fold but using foldl'" $ nf (fold' Fold.sum) xs
]
]
where
xs_ = Set.fromList [1..100000] :: Set Int
fold' :: Foldable f => Fold.Fold a b -> f a -> b
fold' (Fold.Fold step start done) xs = done (foldl' step start xs)
{-# INLINE fold' #-}
Thinking about it more, I realized that there is a more fundamental limitation than GHC optimations: the structure itself. A Set cannot be folded over without requiring space proportional to the depth of the tree, which is $O(\log n)$. The above benchmarks only show that foldl' does not allocate on the heap, but it still uses $O(\log n)$ stack space.
To reiterate, I want to draw attention to the facts that:
- The
foldllibrary cannot make accurate claims about the space required to apply aFold. It depends on the structure to be folded over, the implementation offoldr/foldl'for that structure, and the GHC optimizations at play. Even at their most efficient, the same fold will have different requirements:Fold.fold Fold.sum (xs :: [Int])is $O(1)$ space andFold.fold Fold.sum (xs :: Set Int)is $O(\log n)$ space. - For some structures (like
Setas demonstrated), the implementation offoldl'is practically more efficient than obtaining a left fold viafoldr. At the same time, I cannot think of any situation where the opposite would be true. So I believefoldl'would be a better delegate forFold.foldrather thanfoldr.
Yeah, it should probably clarify that it's only guaranteed to be constant space for lists.
Also, if I'm being really pedantic, it's only guaranteed to be constant space if all folds involved (including user-defined Folds) use an accumulator that is internally strict. This property holds for the existing Folds provided by the foldl package but might not necessarily hold for user-defined Folds.
That's true, and clearer documentation would be great. And what do you think of switching to foldl'?
Oh yeah, I would be okay with switching to foldl'