plutus icon indicating copy to clipboard operation
plutus copied to clipboard

Optimize division-related Integer operations

Open kozross opened this issue 1 year ago • 1 comments

Describe the feature you'd like

The division-related primops in Plutus are essentially direct copies of their GHC counterparts. While this is a good thing in general, there are some inefficiencies we inherit as a result of this.

The biggest example is that (for positive values at least), quot operations do not reduce to right shifts, even on constant arguments:

     bgroup
        "quotient"
        [ bench "naive with constant" . nf (quot (170141183460469231731687303715 :: Integer)) $ 1073741824,
          bench "shift with constant" . nf (unsafeShiftR (170141183460469231731687303715 :: Integer)) $ 30
        ]
-- Produces the following result
--   quotient
--    naive with constant: OK
--      32.1 ns ± 2.4 ns, 143 B  allocated,   0 B  copied,  33 MB peak memory
--    shift with constant: OK
--      17.3 ns ± 1.5 ns,  86 B  allocated,   0 B  copied,  33 MB peak memory

This is a non-trivial difference in both time and space, and while the numbers appear large, they are not particularly big for Integers. This inefficiency extends to rem operations, which could be implemented with masking.

While this appears to be of minor benefit, we observe that quot n (2 ^ k * d) = quot (quot n (2 ^ k)) d for positive numbers. Furthermore, as per #6390 , we can 'decompose' numbers which have a power of 2 in their prime factorization fairly easily (counting trailing bits), which, combined with the above observation, could give us considerable performance. This also relates to the cases raised in #6390, just in the other direction.

Similar techniques could be applied to div and mod as well (or rather, our definitions of them) to produce improvements.

Describe alternatives you've considered

Currently, these optimizations cannot be obtained. Even using CIP-121, CIP-122 and CIP-123 operations, due to the currently-quadratic cost of CIP-121 conversions, this would drown out any benefits we're likely to see. Even though CIP-121 operations could be linear, this would still be an unavoidable pair of copies, due to pinnedness issues in ghc-bignum.

kozross avatar Aug 08 '24 01:08 kozross

This is a non-trivial difference in both time and space

It's very likely a drop in the ocean, because you don't just execute quot at Plutus runtime, you execute this:

monster
BuiltinExpectArgument
  (\ (v_i3DiB :: CekValue DefaultUni DefaultFun ()) ->
     case v_i3DiB of {
       __DEFAULT -> lvl136_r3JLL;
       VCon val_i3GRr ->
         let! { ValueOf uniAct_s3J31 x7_s3J32 ~
         <- val_i3GRr `cast` <Co:9> :: ... } in
         case uniAct_s3J31 `cast` <Co:9> :: ...
         of wild5_i3DiQ {
           __DEFAULT -> <failure>
           DefaultUniInteger co6_i3DiR ->
             BuiltinExpectArgument
               (\ (v1_i3Dj3
                     :: CekValue
                          DefaultUni DefaultFun ()) ->
                  case v1_i3Dj3 of {
                    __DEFAULT -> <failure>;
                    VCon val1_X3 ->
                      let! { ValueOf uniAct1_s3J3a
                                     x11_s3J3b ~
                      <- val1_X3 `cast` <Co:9> :: ... } in
                      case uniAct1_s3J3a
                           `cast` <Co:9> :: ...
                      of wild8_i3Dji {
                        __DEFAULT -> <failure>
                        DefaultUniInteger co9_i3Djj ->
                          let {
                            eta17_i3DiX :: CostStream
                            eta17_i3DiX
                              = let! { __DEFAULT ~ ww_i1jEI
                                <- $wmemoryUsageInteger
                                     (x7_s3J32
                                      `cast` <Co:3> :: ...) } in
                                CostLast ww_i1jEI } in
                          let {
                            mem5_i3Djo :: CostStream
                            mem5_i3Djo
                              = let! { __DEFAULT ~ ww_i1jEI
                                <- $wmemoryUsageInteger
                                     (x11_s3J3b
                                      `cast` <Co:3> :: ...) } in
                                CostLast ww_i1jEI } in
                          join {
                            $j4_i3Dju
                              :: ExBudgetStream
                                 -> BuiltinRuntime
                                      (CekValue
                                         DefaultUni
                                         DefaultFun
                                         ())
                            $j4_i3Dju (nt_i3Djv
                                         :: ExBudgetStream)
                              = let! { __DEFAULT ~ nt1_i3Djw
                                <- nt_i3Djv } in
                                BuiltinCostedResult
                                  nt1_i3Djw
                                  (join {
                                     $j5_i3DjF
                                       :: BuiltinResult
                                            (CekValue
                                               DefaultUni
                                               DefaultFun
                                               ())
                                     $j5_i3DjF
                                       = case x11_s3J3b
                                              `cast` <Co:3> :: ...
                                         of wild9_i3DjP {
                                           IS x12_i3DjQ ->
                                             case x12_i3DjQ
                                             of {
                                               __DEFAULT ->
                                                 let! { __DEFAULT ~ conrep180_i3DjH
                                                 <- integerQuot
                                                      (x7_s3J32
                                                       `cast` <Co:3> :: ...)
                                                      wild9_i3DjP } in
                                                 let! { UnsafeRefl co12_i3DjN ~
                                                 <- unsafeEqualityProof } in
                                                 BuiltinSuccess
                                                   (VCon
                                                      ((ValueOf
                                                          ($WDefaultUniInteger
                                                           `cast` <Co:50> :: ...)
                                                          conrep180_i3DjH)
                                                       `cast` <Co:14> :: ...));
                                               0# ->
                                                 case divZeroError
                                                 of wild11_00 {
                                                 }
                                             };
                                           IP x12_i3DjT ->
                                             let! { __DEFAULT ~ conrep180_i3DjH
                                             <- integerQuot
                                                  (x7_s3J32
                                                   `cast` <Co:3> :: ...)
                                                  wild9_i3DjP } in
                                             let! { UnsafeRefl co12_i3DjN ~
                                             <- unsafeEqualityProof } in
                                             BuiltinSuccess
                                               (VCon
                                                  ((ValueOf
                                                      ($WDefaultUniInteger
                                                       `cast` <Co:50> :: ...)
                                                      conrep180_i3DjH)
                                                   `cast` <Co:14> :: ...));
                                           IN x12_i3DjV ->
                                             let! { __DEFAULT ~ conrep180_i3DjH
                                             <- integerQuot
                                                  (x7_s3J32
                                                   `cast` <Co:3> :: ...)
                                                  wild9_i3DjP } in
                                             let! { UnsafeRefl co12_i3DjN ~
                                             <- unsafeEqualityProof } in
                                             BuiltinSuccess
                                               (VCon
                                                  ((ValueOf
                                                      ($WDefaultUniInteger
                                                       `cast` <Co:50> :: ...)
                                                      conrep180_i3DjH)
                                                   `cast` <Co:14> :: ...))
                                         } } in
                                   case x11_s3J3b
                                        `cast` <Co:3> :: ...
                                   of {
                                     IS x12_i3DjY ->
                                       case x12_i3DjY of {
                                         __DEFAULT ->
                                           jump $j5_i3DjF;
                                         0# -> lvl92_r3JIJ
                                       };
                                     IP x12_i3Dk4 ->
                                       jump $j5_i3DjF;
                                     IN x12_i3Dk6 ->
                                       jump $j5_i3DjF
                                   }) } in
                          case runCpu_i3Diz
                                 eta17_i3DiX mem5_i3Djo
                          of wild9_i3Dk8 {
                            CostLast bx_i3Dk9 ->
                              case runMem_i3DiA
                                     eta17_i3DiX mem5_i3Djo
                              of wild10_i3Dkb {
                                CostLast bx1_i3Dkc ->
                                  jump $j4_i3Dju
                                    (ExBudgetLast
                                       (ExBudget
                                          bx_i3Dk9
                                          bx1_i3Dkc));
                                CostCons ipv_i3Dke
                                         ipv1_i3Dkf ->
                                  jump $j4_i3Dju
                                    (zipCostStreamGo
                                       wild9_i3Dk8
                                       wild10_i3Dkb)
                              };
                            CostCons ipv_i3Dki
                                     ipv1_i3Dkj ->
                              jump $j4_i3Dju
                                (zipCostStreamGo
                                   wild9_i3Dk8
                                   (runMem_i3DiA
                                      eta17_i3DiX
                                      mem5_i3Djo))
                          }
                      }
                  })
         }
     })) } in

Most of that is pretty cheap, but is still overhead and I really doubt that 15 extra nanoseconds gonna make a difference.

I think this is very low-priority, so I'm going to triage it as such.

effectfully avatar Aug 08 '24 11:08 effectfully

Thank you for providing this context.

I doubt we're going to consider investigating this in any foreseeable future. There's so much we can optimize and extra 15ns of builtin performance, requiring not only non-trivial optimizations but also non-trivial costing changes... just not worth it.

I'm therefore closing the issue. Maybe in 20 years someone will find a specific use case where those 15 nanoseconds make a difference.

effectfully avatar Jun 25 '25 20:06 effectfully