streamly icon indicating copy to clipboard operation
streamly copied to clipboard

Perf regressions when generating all streams using unfolds

Open harendra-kumar opened this issue 3 years ago • 0 comments

Build flag use-unfolds (PR #1717) implements all StreamD stream generation routines with unfolds. Following significant regressions are seen with this change vs without this:

Data.Parser(cpuTime)
Benchmark                                                                   default(0)(μs) default(1) - default(0)(%)
--------------------------------------------------------------------------- -------------- --------------------------
All.Data.Parser/o-1-space.shortest                                                   74.57                   +3035.18
All.Data.Parser/o-1-space.tee                                                        74.55                   +3028.79
All.Data.Parser/o-1-space.teeFst                                                     74.57                   +2991.24
All.Data.Parser/o-1-space.longest                                                   111.95                   +1983.32
All.Data.Parser/o-1-space.concatSequence                                           1260.95                    +157.70
All.Data.Parser/o-1-space.takeStartBy                                              1114.74                     +25.44

Data.Parser(Allocated)
Benchmark                                                                   default(0)(KiB) default(1) - default(0)(%)
--------------------------------------------------------------------------- --------------- --------------------------
All.Data.Parser/o-1-space.concatSequence                                            6239.98                    +200.36
All.Data.Parser/o-1-space.takeStartBy                                               6239.97                     +25.15
All.Data.Parser/o-1-space.parseBreak (recursive)                                   11691.76                     +13.41
All.Data.Parser/o-1-space.parseMany/Unfold/1000 arrays/take 1                         27.20                     +13.24

Data.Parser.ParserD(cpuTime)
Benchmark                                                                      default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------------ -------------- --------------------------
All.Data.Parser.ParserD/o-1-space.shortest (all,any)                                    74.59                   +3054.04
All.Data.Parser.ParserD/o-1-space.teeFst (all,any)                                      74.58                   +2981.74
All.Data.Parser.ParserD/o-1-space.tee (all,any)                                         74.55                   +2979.95
All.Data.Parser.ParserD/o-1-space.longest (all,any)                                    111.85                   +1968.33
All.Data.Parser.ParserD/o-1-space.sequenceParser                                      1186.87                    +170.20
All.Data.Parser.ParserD/o-1-space.takeStartBy                                         1154.39                     +18.32

Data.Parser.ParserD(Allocated)
Benchmark                                                                      default(0)(KiB) default(1) - default(0)(%)
------------------------------------------------------------------------------ --------------- --------------------------
All.Data.Parser.ParserD/o-1-space.sequenceParser                                       6239.98                    +187.57
All.Data.Parser.ParserD/o-1-space.takeStartBy                                          6239.97                     +25.15
Data.Stream.StreamD(cpuTime)
Benchmark                                                                  default(0)(ns) default(1) - default(0)(%)
-------------------------------------------------------------------------- -------------- --------------------------
All.Data.Stream.StreamD/o-n-stack.transformationX4.intersperse                      95.49                     +77.64
All.Data.Stream.StreamD/o-1-space.concat.concatMapRepl (sqrt n of sqrt n)       916629.00                     +65.71
All.Data.Stream.StreamD/o-1-space.zipping.eqBy                                   74516.40                     +33.67
All.Data.Stream.StreamD/o-1-space.concat.concatMapPure (1 of n)                2168480.00                     +32.21
All.Data.Stream.StreamD/o-1-space.concat.concatMapPure (n of 1)                3613160.00                     +32.09
All.Data.Stream.StreamD/o-1-space.concat.concatMap (n of 1)                    2307210.00                     +30.06
All.Data.Stream.StreamD/o-1-space.concat.concatMap (sqrt n of sqrt n)          1046920.00                     +26.21
All.Data.Stream.StreamD/o-1-space.elimination.uncons                           1010590.00                     +23.59
All.Data.Stream.StreamD/o-1-space.elimination.foldBreak                        1053920.00                     +22.11
All.Data.Stream.StreamD/o-1-space.concat.concatMap (1 of n)                    1641800.00                     +21.84
All.Data.Stream.StreamD/o-1-space.zipping.cmpBy                                  99339.70                     +12.78
All.Data.Stream.StreamD/o-n-stack.elimination.headTail                         3296800.00                     +12.35
All.Data.Stream.StreamD/o-1-space.mixed.take-scan                                56622.60                     +10.80
All.Data.Stream.StreamD/o-n-stack.elimination.nullTail                         3310110.00                     +10.66

Data.Stream.StreamD(Allocated)
Benchmark                                                                  default(0)(Bytes) default(1) - default(0)(%)
-------------------------------------------------------------------------- ----------------- --------------------------
All.Data.Stream.StreamD/o-n-stack.transformationX4.intersperse                        639.00                     +90.14
All.Data.Stream.StreamD/o-1-space.concat.concatMapRepl (sqrt n of sqrt n)         5601151.00                     +72.37
All.Data.Stream.StreamD/o-1-space.concat.concatMap (sqrt n of sqrt n)             4019762.00                     +39.94
All.Data.Stream.StreamD/o-1-space.concat.concatMap (1 of n)                       7996807.00                     +39.87
All.Data.Stream.StreamD/o-n-stack.elimination.nullTail                           11183588.00                     +28.84
All.Data.Stream.StreamD/o-n-stack.elimination.headTail                           11183588.00                     +28.72
All.Data.Stream.StreamD/o-1-space.elimination.foldBreak                           6389751.00                     +25.15
All.Data.Stream.StreamD/o-1-space.elimination.uncons                              6389751.00                     +25.15
All.Data.Stream.StreamD/o-n-stack.elimination.tail                                7199470.00                     +22.44
All.Data.Stream.StreamD/o-1-space.concat.concatMapPure (sqrt n of sqrt n)         7241573.00                     +22.23
All.Data.Stream.StreamD/o-1-space.concat.concatMapPure (1 of n)                  14384460.00                     +22.20
All.Data.Stream.StreamD/o-1-space.concat.concatMap (n of 1)                       9596160.00                     +16.55
All.Data.Stream.StreamD/o-1-space.nested.filterAllOutPure                        16074195.00                     +10.16
All.Data.Stream.StreamD/o-1-space.nested.filterAllOut                            16074195.00                     +10.16
All.Data.Stream.StreamD/o-1-space.concat.concatMapPure (n of 1)                  19173230.00                      +8.19

Data.Unfold(Allocated)
Benchmark                                                                      default(0)(KiB) default(1) - default(0)(%)
------------------------------------------------------------------------------ --------------- --------------------------
All.Data.Unfold/o-1-space.generation.fromStreamD                                       3904.70                    +100.00
Benchmark                                                                                                     default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------------------------------------------- -------------- --------------------------
All.Prelude.Serial/o-1-space.grouping.groups                                                                           55.97                   +1903.96
All.Prelude.Serial/o-1-space.grouping.groupsByEq                                                                       55.95                   +1868.08
All.Prelude.Serial/o-1-space.generation.intFromThenTo                                                                  37.27                    +902.48
All.Prelude.Serial/o-1-space.generation.enumerateFromThenTo                                                            37.51                    +896.71
All.Prelude.Serial/o-1-space.elimination.the                                                                           37.29                    +799.84
All.Prelude.Serial/o-1-space.generation.integerFromStep                                                               772.55                    +460.63
All.Prelude.Serial/o-1-space.multi-stream-pure.eqBy                                                                    99.51                    +127.23
All.Prelude.Serial/o-1-space.multi-stream-pure./=                                                                      99.46                    +127.10
All.Prelude.Serial/o-1-space.multi-stream-pure.==                                                                      99.54                    +126.35
All.Prelude.Serial/o-1-space.multi-stream-pure.cmpBy                                                                  112.14                    +109.75
All.Prelude.Serial/o-1-space.multi-stream-pure.<                                                                      111.89                    +106.22
All.Prelude.Serial/o-1-space.exceptions/serial.retryUnknown                                                          1228.00                     +81.57
All.Prelude.Serial/o-1-space.exceptions/serial.retryNoneSimple                                                       1601.78                     +73.34
All.Prelude.Serial/o-1-space.concat.concatMapRepl (sqrt n of sqrt n)                                                  834.44                     +72.03
All.Prelude.Serial/o-1-space.generation.repeatM                                                                        37.28                     +60.65
All.Prelude.Serial/o-1-space.concat.concatMap (n of 1)                                                               2211.86                     +54.75
All.Prelude.Serial/o-1-space.exceptions/serial.retryNone                                                             1551.15                     +43.24
All.Prelude.Serial/o-1-space.concat.concatMapM (n of 1)                                                              2383.25                     +39.20
All.Prelude.Serial/o-1-space.insertingX4.intersperse                                                                 3480.56                     +35.88
All.Prelude.Serial/o-1-space.concat.concatMapPure (n of 1)                                                           3492.24                     +33.11
All.Prelude.Serial/o-1-space.concat.concatMapM (1 of n)                                                              1475.49                     +30.79
All.Prelude.Serial/o-1-space.elimination.build.Identity.foldrMToListLength                                            841.54                     +24.35
All.Prelude.Serial/o-1-space.concat.concatMapPure (1 of n)                                                           2489.95                     +20.46
All.Prelude.Serial/o-1-space.foldable.min (ord)                                                                      1230.44                     +19.67
All.Prelude.Serial/o-n-heap.buffered.reverse                                                                         5910.28                     +18.93
All.Prelude.Serial/o-1-space.generation.IsString.fromString                                                           611.51                     +16.58
All.Prelude.Serial/o-n-heap.toList.toListRev                                                                         6355.63                     +16.26
All.Prelude.Serial/o-1-space.elimination.reduce.IO.foldl1'                                                           1187.01                     +15.85
All.Prelude.Serial/o-n-space.foldr.foldrM/reduce/Identity (sum)                                                      1476.97                     +14.01
All.Prelude.Serial/o-1-space.concat.concatMapM (sqrt n of sqrt n)                                                     818.29                     +12.94
All.Prelude.Serial/o-1-space.multi-stream.eqBy                                                                         99.34                     +12.88
All.Prelude.Serial/o-n-heap.foldl.foldl'/build/Identity                                                              7373.49                     +12.71
All.Prelude.Serial/o-1-space.elimination.uncons                                                                      1283.53                     +11.91
All.Prelude.Serial/o-n-heap.foldl.foldlM'/build/IO                                                                   6233.96                     +11.62
All.Prelude.Serial/o-1-space.filteringX4.foldFilter-even                                                             4789.54                     +10.74
All.Prelude.Serial/o-n-heap.buffered.reverse'                                                                         123.23                     +10.11

Prelude.Serial(Allocated)
Benchmark                                                                                                     default(0)(KiB) default(1) - default(0)(%)
------------------------------------------------------------------------------------------------------------- --------------- --------------------------
All.Prelude.Serial/o-1-space.generation.integerFromStep                                                               1558.40                     +98.77
All.Prelude.Serial/o-1-space.exceptions/serial.retryUnknown                                                           7809.37                     +89.81
All.Prelude.Serial/o-1-space.concat.concatMapRepl (sqrt n of sqrt n)                                                  5485.84                     +71.36
All.Prelude.Serial/o-1-space.exceptions/serial.retryNoneSimple                                                       11710.86                     +66.58
All.Prelude.Serial/o-1-space.exceptions/serial.retryNone                                                              9370.78                     +41.50
All.Prelude.Serial/o-n-heap.toList.toListRev                                                                          3875.85                     +40.73
All.Prelude.Serial/o-n-heap.foldl.foldlM'/build/IO                                                                    3888.57                     +40.27
All.Prelude.Serial/o-1-space.concat.concatMapM (sqrt n of sqrt n)                                                     3925.56                     +39.94
All.Prelude.Serial/o-1-space.concat.concatMap (sqrt n of sqrt n)                                                      3925.56                     +39.94
All.Prelude.Serial/o-1-space.concat.concatMapM (1 of n)                                                               7809.39                     +39.86
All.Prelude.Serial/o-1-space.concat.concatMap (1 of n)                                                                7809.39                     +39.86
All.Prelude.Serial/o-n-heap.foldl.foldlM'/build/Identity                                                              3875.95                     +39.75
All.Prelude.Serial/o-n-heap.foldl.foldl'/build/IO                                                                     3888.59                     +39.28
All.Prelude.Serial/o-n-heap.buffered.reverse                                                                          3888.73                     +39.28
All.Prelude.Serial/o-n-heap.foldl.foldl'/build/Identity                                                               3875.95                     +38.42
All.Prelude.Serial/o-1-space.foldable.min (ord)                                                                       6239.97                     +25.15
All.Prelude.Serial/o-1-space.elimination.reduce.IO.foldl1'                                                            6239.97                     +25.15
All.Prelude.Serial/o-1-space.insertingX4.intersperse                                                                 25766.76                     +21.14
All.Prelude.Serial/o-1-space.concat.concatMapPure (sqrt n of sqrt n)                                                  7858.93                     +20.02
All.Prelude.Serial/o-1-space.concat.concatMapPure (1 of n)                                                           15618.76                     +20.00
All.Prelude.Serial/o-1-space.elimination.uncons                                                                       7809.37                     +20.00
All.Prelude.Serial/o-n-heap.buffered.intersectBy (sqrtVal)                                                              39.78                     +19.65
All.Prelude.Serial/o-1-space.concat.concatMapM (n of 1)                                                               9371.25                     +16.55
All.Prelude.Serial/o-1-space.concat.concatMap (n of 1)                                                                9371.25                     +16.55
All.Prelude.Serial/o-1-space.concat-foldable.foldMapWith (<>) (Stream)                                               21861.11                     +10.69
All.Prelude.Serial/o-1-space.mapping.foldrS                                                                          16377.52                      +9.71
All.Prelude.Serial/o-1-space.elimination.init                                                                        18725.03                      +8.35
All.Prelude.Serial/o-n-heap.buffered.joinInner (sqrtVal)                                                             18853.86                      +8.33
All.Prelude.Serial/o-1-space.concat.concatMapPure (n of 1)                                                           18701.17                      +8.32

Also see #1709 and #1710 . The results above are after the fix for #1709 .

harendra-kumar avatar Jul 18 '22 08:07 harendra-kumar