plutus icon indicating copy to clipboard operation
plutus copied to clipboard

[Epic] CEK machine improvements

Open effectfully opened this issue 1 year ago • 1 comments

This issue is for dumping all the plans regarding optimization of the CEK machine in a largely unstructured way.

effectfully avatar May 04 '24 04:05 effectfully

  • [ ] we should try specializing the CEK machine at the restricting ExBudgetMode as it's the one that is used in production and having that code specialized may impact performance significantly
  • [ ] do we really need to do costing in ST? Can we just subtract from the current budget purely or whatever?
  • [ ] do we really need slippage in production? Can we just keep subtracting from the main budget and that's it?
  • [x] Data.RandomAccessList.Class.indexOne returns a Maybe, the strict EvaluationResult instead would be more efficient. Or we could get rid of the Maybe entirely. PR: #6663 (-12%), #6702 (-1%)
  • [x] We should statically unroll itraverseCounter_ in order not do it at runtime. PR: #5265 (-2.5%)
  • [x] GHC Core for UntypedPlutusCore.Evaluation.Machine.Cek.ExBudgetMode contains unnecessary packing and unpacking, for example 0# -> jump $j_sGXo r#_iGvO (I# r#_iGvO). PR: #6705 (-1.8%)
  • [ ] UntypedPlutusCore.Evaluation.Machine.Cek.Internal for the Vector-based Case-expressions is unoptimized: it contains things like int2Word# followed by a word2Int# (i.e. a non-free no-op) and what looks like an unnecessary call to integerFromWord64#. UPD: will be irrelevant once #7012 is merged
  • [ ] Is it possible to support join points in the CEK machine? The "Compiling without continuations" paper seems to say “yes”. If we had join points, we could’ve better implemented case-of-case (as per the paper) and possibly CSE
  • [ ] currently we implement the stack in the CEK machine as a pure data structure, but it is conceivable that a mutable one would be faster, so we should try it out. Doesn’t seem to be too complicated, given that the CEK machine runs in the ST monad anyway.
  • [ ] currently we use Vector in FrameCases. Would it be more efficient to use Array/SmallArray/custom data/array-primops/manual loop unrolling? UPD: Yes: #7012, but is blocked on benchmarks including deserialization (or whatever) costs
  • [ ] we should make the CEK machine parameterized by the type of names (using a LookupName name env class or whatever), so that we don't need to juggle those performance-hurting textual names -- as we should've done originally anyway (extricated from #5834). PR (very unexpected results): #6706
  • [ ] discharging-related bookkeeping costs us some, we should try dropping discharging at least to see how much it costs us, we're now supposed to return () anyway
  • [x] we should return a dedicated CekResult value from the CEK machine, not just tuples. Would be a tad more efficient a whole lot less potentially problematic. PR: #7272
  • [ ] Note [Term constructor ordering and numbers] explains how ordering of constructors has an impact of efficiency, but it seems like the Case and Constr constructors don't align with that reasoning. Also the Note says there's 8 constructors, but there's 10 now
  • [ ] Quoting Kenneth: I'm also wondering about the other implementations of RandomAccessList here. I don't remember the details, but I think we probably tried a number of different implementations in order to see which was the best. I ran index-envs-bench and looked at the html report and it looks as if the different versions win and lose in different places. In particular SkewBinarySlab looks as if it performs quite similarly. It mght be worth (a) refactoring SkewBinarySlab in a smilar way to SkewBinary (having similar implementations might help with maintainability anyway), and (b) running all of the benchmarks with CEK machines modified to use each of the different implementations in order to check that SekwBinary is still the best thing to use.
  • [x] The check that w==1 is probably redundant here as long as the trees are well-constructed (i.e. the RAList implementation is correct). This could be turned into an assert (which is discarded AFAIK for -O1) or removed altogether https://github.com/IntersectMBO/plutus/blob/3179337dd588970310e390cc8496dfbf3c21e322/plutus-core/index-envs/src/Data/RandomAccessList/SkewBinary.hs#L110 No benefit, see https://github.com/IntersectMBO/plutus/pull/7268

effectfully avatar May 04 '24 04:05 effectfully