cardano-ledger icon indicating copy to clipboard operation
cardano-ledger copied to clipboard

Optimize transaction generators

Open nfrisby opened this issue 3 years ago • 2 comments

In their downstream uses in ouroboros-consensus, we've found the Shelley transaction generators are slow. From this excerpt from the .prof summary, it looks like they're allocating a lot and hence are slow.

COST CENTRE                        MODULE                                  SRC                                                              %time %alloc

choose                             Test.QuickCheck.Gen                     Test/QuickCheck/Gen.hs:130:1-59                                   28.3   36.6
genTx                              Test.Shelley.Spec.Ledger.Generator.Utxo src/Test/Shelley/Spec/Ledger/Generator/Utxo.hs:(170,1)-(237,86)    4.5    4.2
shuffle                            Test.QuickCheck.Gen                     Test/QuickCheck/Gen.hs:(217,1)-(219,55)                            3.9    6.9

By accident, we didn't retain enough info to identify the specific problematic calls, but the genTx is at least incriminated. Here's the excerpt from the .prof body:

COST CENTRE   MODULE                                   SRC                                                              no.       entries  %time %alloc   %time %alloc

genTx         Test.Shelley.Spec.Ledger.Generator.Utxo  src/Test/Shelley/Spec/Ledger/Generator/Utxo.hs:(170,1)-(237,86)  111572      19825    4.5    4.2    47.2   66.1
 shuffle      Test.QuickCheck.Gen                      Test/QuickCheck/Gen.hs:(217,1)-(219,55)                          111593     133890    3.9    6.9    36.7   56.8
  choose      Test.QuickCheck.Gen                      Test/QuickCheck/Gen.hs:130:1-59                                  111599          0   28.1   36.4    30.2   42.2

So it's the shuffles in genTx. They're allocating "too much".

At a glance, I think multiple of the uses of shuffle could be manually fused. For example:

https://github.com/input-output-hk/cardano-ledger-specs/blob/befbf160ed840d2d6ad843902c9f668507dc7838/shelley/chain-and-ledger/shelley-spec-ledger-test/src/Test/Shelley/Spec/Ledger/Generator/Utxo.hs#L559-L560

In this one, the utxo list is probably quite large. And then we immediately project out just a few (I think maxNumGenInputs is just 10 here.). So just generating 10 indices without replacement and then projecting those would avoid massive allocation (I think).

Hopefully there are similarly straight-forward opportunities for the other shuffle uses and whatever else follow-up profiling identifies.

nfrisby avatar Oct 08 '20 15:10 nfrisby

@uroboros added a change which gave a 2x speedup here, but it seems that the re-organisation caused a 10x slowdown. So we still have more to do here.

nc6 avatar Oct 28 '20 15:10 nc6

To avoid potential confusion: the 10x number I fed to Clarke was measured as the affect on the (downstream) Shelley ThreadNet tests' runtime -- I don't have intuition for how a 2x upstream improvement will manifest downstream. Could be good, but I'm guessing it'll be deflated (similar reasoning as Amdahl's Law).

nfrisby avatar Oct 28 '20 16:10 nfrisby