SOP lists aren't faster than built-in lists
With #5711 we get built-in types that are either on par with SOPs or are even faster. Filtering out irrelevant info for the latter set of benchmarks we get (where Scott is a misnomer, it should be Sop):
sum/compiled-from-Haskell/sum-right-builtin/10000 12.02 ms
sum/compiled-from-Haskell/sum-right-Scott/10000 14.19 ms
sum/compiled-from-Haskell/sum-left-builtin/10000 12.17 ms
sum/compiled-from-Haskell/sum-left-Scott/10000 13.52 ms
sum/hand-written-PLC/sum-right-builtin/10000 7.865 ms
sum/hand-written-PLC/sum-right-Scott/10000 8.258 ms
sum/hand-written-PLC/sum-left-builtin/10000 6.556 ms
sum/hand-written-PLC/sum-left-Scott/10000 7.142 ms
SOP lists are consistently slower than built-in lists and the implementation of SOP-handling functions
{-# INLINABLE foldLeftScott #-}
foldLeftScott :: (b -> a -> b) -> b -> [a] -> b
foldLeftScott _ z [] = z
foldLeftScott f z (x:xs) = foldLeftScott f (f z x) xs
{-# INLINABLE sumLeftScott #-}
sumLeftScott :: [Integer] -> Integer
sumLeftScott l = foldLeftScott (+) 0 l
{-# INLINABLE foldRightScott #-}
foldRightScott :: (a -> b -> b) -> b -> [a] -> b
foldRightScott _ z [] = z
foldRightScott f z (x:xs) = f x $! (foldRightScott f z xs)
{-# INLINABLE sumRightScott #-}
sumRightScott :: [Integer] -> Integer
sumRightScott l = foldRightScott (+) 0 l
does not appear any worse than the implementation of the built-in-list-handling functions:
{-# INLINABLE foldLeftBuiltin #-}
foldLeftBuiltin :: (b -> a -> b) -> b -> BI.BuiltinList a -> b
foldLeftBuiltin f z l = B.matchList l z (\x xs -> (foldLeftBuiltin f (f z x) xs))
{-# INLINABLE sumLeftBuiltin #-}
sumLeftBuiltin :: BI.BuiltinList Integer -> Integer
sumLeftBuiltin l = foldLeftBuiltin B.addInteger 0 l
{-# INLINABLE foldRightBuiltin #-}
foldRightBuiltin :: (a -> b -> b) -> b -> BI.BuiltinList a -> b
foldRightBuiltin f z l = B.matchList l z (\x xs -> f x $! (foldRightBuiltin f z xs))
{-# INLINABLE sumRightBuiltin #-}
sumRightBuiltin :: BI.BuiltinList Integer -> Integer
sumRightBuiltin l = foldRightBuiltin B.addInteger 0 l
This is ridiculous. Even accounting for noise in benchmarks and the recent work on speeding up FrameCases (#5818, #5813, #5816) SOPs still don't appear measurably faster than builtins (assuming faster pattern matching for builtins lands). How come? Is it all the lambda wrangling that makes them slow or is there something else?
Or maybe we should shrug it off, since it's just lists and Data-encoded data types are about 2.5-3x slower than SOPs? Still, I'd expect SOP lists to be faster than built-in lists.
I agree this is surprising. Let's check history.
- Here's the switch to SOPs: https://github.com/IntersectMBO/plutus/pull/5255/files#diff-50270fcb37c26bd85f312f874c2f8cf09acf92278b05ea02f467641953e9915d
- Before: scott=15..., builtins=16...
- After: sops=13..., builtins=16...
- Today: sops=13..., butiltins=17...
So builtin lists were not bad before! And, for some reason the effect of the SOPs change was less dramatic on the list benchmark anyway :thinking: That actually makes me wonder... could the particular list fold benchmarks be fast enough that they're being dominated by the cost of the additions or the recursion? That would shrink the relative differences from the datatype wrangling.
But yeah, it's surprising that you're able to get it down so low. Alternatively, maybe we shouldn't be surprised that our builtin data structures are decent. SOPs just give us that level of performance for user-defined data structures :grin: At the end of the day, they're both doing a hopefully similar amount of work in terms of looking at a tag and dispatching, I guess you've just managed to get the builtin machinery overhead down extremely low!
could the particular list fold benchmarks be fast enough that they're being dominated by the cost of the additions or the recursion?
You could maybe test that by setting the addition costs to zero in builtinCostModel.json. Alternatively uplc -t will tell you how much of the execution time is spent on different things according to the cost model, whose predictions may not correspond precisely to what happens when you run the benchmarks.
That actually makes me wonder... could the particular list fold benchmarks be fast enough that they're being dominated by the cost of the additions or the recursion? That would shrink the relative differences from the datatype wrangling.
I do think that those benchmarks are dominated by the cost of recursion and additions, however merely speeding up pattern matching on built-in lists gives us:
sum/compiled-from-Haskell/sum-left-builtin/10000 14.25 ms 11.95 ms -16.1%
sum/hand-written-PLC/sum-right-builtin/10000 9.530 ms 7.806 ms -18.1%
So at the very least we know that plenty of time is indeed spent doing pattern matching (and it's not that speeding it up eliminates it completely, we still spend time there). So I think that the benchmarks are properly sensitive to the cost of doing pattern matching and SOPs are meant to speed up pattern matching.
But yeah, it's surprising that you're able to get it down so low. Alternatively, maybe we shouldn't be surprised that our builtin data structures are decent. SOPs just give us that level of performance for user-defined data structures 😁 At the end of the day, they're both doing a hopefully similar amount of work in terms of looking at a tag and dispatching, I guess you've just managed to get the builtin machinery overhead down extremely low!
Lists specifically are less efficient than they could be though because of the way we represent tags for values of polymorphic built-in types. If we ever resolve #4781, built-in lists will become faster.
Latest numbers are:
sum/compiled-from-Haskell/sum-right-builtin/5000 5.467 ms
sum/compiled-from-Haskell/sum-right-Scott/5000 5.410 ms
sum/compiled-from-Haskell/sum-left-builtin/5000 5.126 ms
sum/compiled-from-Haskell/sum-left-Scott/5000 5.077 ms
sum/hand-written-PLC/sum-right-builtin/5000 4.209 ms
sum/hand-written-PLC/sum-right-Scott/5000 4.042 ms
sum/hand-written-PLC/sum-left-builtin/5000 3.452 ms
sum/hand-written-PLC/sum-left-Scott/5000 3.563 ms
i.e. sops and builtins perform about the same.
i.e. sops and builtins perform about the same.
An approximate costing model for matchList copied from the one for chooseList gives us budgets that are also on par with SOP lists CPU-wise (SOPs are ~20% more expensive MEM-wise though):
plutus-benchmark/lists/test/Sum/9.6/left-fold-built-in.budget.golden
({cpu: 86545294
| mem: 397232})
vs
plutus-benchmark/lists/test/Sum/9.6/left-fold-scott.budget.golden
({cpu: 87656900
| mem: 484900})
And
plutus-benchmark/lists/test/Sum/9.6/right-fold-built-in.budget.golden
({cpu: 91345294
| mem: 427232})
vs
plutus-benchmark/lists/test/Sum/9.6/right-fold-scott.budget.golden
({cpu: 87656900
| mem: 484900})
Latest data:
plutus-benchmark/lists/test/Sum/9.6/left-fold-built-in.eval.golden
CPU: 122_258_594
Memory: 523_832
Size: 78
(con integer 5050)
vs
plutus-benchmark/lists/test/Sum/9.6/left-fold-scott.eval.golden
CPU: 69_848_900
Memory: 373_600
Size: 263
(con integer 5050)
I.e. builtins are now slower as they should be. The biggest impact had the new costing model in #6087:
Mystery solved I guess.