conduit icon indicating copy to clipboard operation
conduit copied to clipboard

Better performance with a left associative `.|`?

Open kabuhr opened this issue 9 months ago • 2 comments

In a recent Stack Overflow question, Kwobny asked why .| was infixr. The potential "performance issue" raised in that question wasn't very compelling (i.e., so what if ((...((c1 .| c2) .| c3) ... .| c999999) .| c1000000) | return () is faster than the alternative), but I ended up running a few quick Criterion benchmarks on simple, pure conduits:

rassoc1 :: ConduitT () Void Identity Integer
rassoc1 = yieldMany [0..1000000] .| mapC (*2) .| mapC (+4) .| sumC

rassoc2 :: ConduitT () Void Identity Integer
rassoc2 = yieldMany [1..] .| filterC (50 `divides`) .| takeC 10000 .| mapC (*2) .| sumC
    where d `divides` n = n `mod` d == 0

versus their left-associative equivalents (see below) and was surprised to discover that the left-associative versions were orders of magnitude faster than the right-associative versions (like 50ms to 10ms for rassoc1 versus its left-associative equivalent, and 8ms to 100us for rassoc2).

I admittedly haven't spent a lot of time studying this, but I thought I'd raise an issue here in case there's a quick explanation. Was the decision to encourage right associative conduits made early on because they "looked right", or is there some specific advantage to right associative conduits that makes the default associativity of .| the right choice (pun!). Maybe left associative conduits don't actually achieve constant memory usage? Maybe there's some advantage in the ordering of monadic effects for right associative conduits? Or maybe the performance issue above is just a missing REWRITE rule, and right associative conduits achieve better performance in more realistic workflows. Any thoughts?

The benchmark:

import Criterion
import Criterion.Main
import Conduit

(|.) ::  Monad m => ConduitT a b m () -> ConduitT b c m r -> ConduitT a c m r
(|.) = (.|)
infixl 2 |.

d `divides` n = n `mod` d == 0

rassoc1, rassoc2 :: ConduitT () Void Identity Integer
rassoc1 = yieldMany [0..1000000] .| mapC (*2) .| mapC (+4) .| sumC
rassoc2 = yieldMany [1..] .| filterC (50 `divides`) .| takeC 10000 .| mapC (*2) .| sumC

lassoc1, lassoc2 :: ConduitT () Void Identity Integer
lassoc1 = yieldMany [0..1000000] |. mapC (*2) |. mapC (+4) |. sumC
lassoc2 = yieldMany [1..] |. filterC (50 `divides`) |. takeC 10000 |. mapC (*2) |. sumC

main = defaultMain
  [ bench "rassoc1" $ whnf runConduit rassoc1
  , bench "lassoc1" $ whnf runConduit lassoc1
  , bench "rassoc2" $ whnf runConduit rassoc2
  , bench "lassoc2" $ whnf runConduit lassoc2
  ]

{-
   $ ghc --version
   The Glorious Glasgow Haskell Compilation System, version 9.4.8
   $ ghc -O ConduitAssoc1.hs && ./ConduitAssoc1
   Loaded package environment from /u/buhr/.ghc/x86_64-linux-9.4.8/environments/default
   benchmarking rassoc1
   time                 49.88 ms   (49.60 ms .. 50.58 ms)
   ...
   benchmarking lassoc1
   time                 11.51 ms   (11.28 ms .. 11.76 ms)
   ...
   benchmarking rassoc2
   time                 8.887 ms   (8.881 ms .. 8.893 ms)
   ...
   benchmarking lassoc2
   time                 118.8 μs   (118.0 μs .. 120.4 μs)
   ...
-}

kabuhr avatar May 03 '24 16:05 kabuhr