Optimize division-related Integer operations
Describe the feature you'd like
The division-related primops in Plutus are essentially direct copies of their GHC counterparts. While this is a good thing in general, there are some inefficiencies we inherit as a result of this.
The biggest example is that (for positive values at least), quot operations do not reduce to right shifts, even on constant arguments:
bgroup
"quotient"
[ bench "naive with constant" . nf (quot (170141183460469231731687303715 :: Integer)) $ 1073741824,
bench "shift with constant" . nf (unsafeShiftR (170141183460469231731687303715 :: Integer)) $ 30
]
-- Produces the following result
-- quotient
-- naive with constant: OK
-- 32.1 ns ± 2.4 ns, 143 B allocated, 0 B copied, 33 MB peak memory
-- shift with constant: OK
-- 17.3 ns ± 1.5 ns, 86 B allocated, 0 B copied, 33 MB peak memory
This is a non-trivial difference in both time and space, and while the numbers appear large, they are not particularly big for Integers. This inefficiency extends to rem operations, which could be implemented with masking.
While this appears to be of minor benefit, we observe that quot n (2 ^ k * d) = quot (quot n (2 ^ k)) d for positive numbers. Furthermore, as per #6390 , we can 'decompose' numbers which have a power of 2 in their prime factorization fairly easily (counting trailing bits), which, combined with the above observation, could give us considerable performance. This also relates to the cases raised in #6390, just in the other direction.
Similar techniques could be applied to div and mod as well (or rather, our definitions of them) to produce improvements.
Describe alternatives you've considered
Currently, these optimizations cannot be obtained. Even using CIP-121, CIP-122 and CIP-123 operations, due to the currently-quadratic cost of CIP-121 conversions, this would drown out any benefits we're likely to see. Even though CIP-121 operations could be linear, this would still be an unavoidable pair of copies, due to pinnedness issues in ghc-bignum.
This is a non-trivial difference in both time and space
It's very likely a drop in the ocean, because you don't just execute quot at Plutus runtime, you execute this:
monster
BuiltinExpectArgument
(\ (v_i3DiB :: CekValue DefaultUni DefaultFun ()) ->
case v_i3DiB of {
__DEFAULT -> lvl136_r3JLL;
VCon val_i3GRr ->
let! { ValueOf uniAct_s3J31 x7_s3J32 ~
<- val_i3GRr `cast` <Co:9> :: ... } in
case uniAct_s3J31 `cast` <Co:9> :: ...
of wild5_i3DiQ {
__DEFAULT -> <failure>
DefaultUniInteger co6_i3DiR ->
BuiltinExpectArgument
(\ (v1_i3Dj3
:: CekValue
DefaultUni DefaultFun ()) ->
case v1_i3Dj3 of {
__DEFAULT -> <failure>;
VCon val1_X3 ->
let! { ValueOf uniAct1_s3J3a
x11_s3J3b ~
<- val1_X3 `cast` <Co:9> :: ... } in
case uniAct1_s3J3a
`cast` <Co:9> :: ...
of wild8_i3Dji {
__DEFAULT -> <failure>
DefaultUniInteger co9_i3Djj ->
let {
eta17_i3DiX :: CostStream
eta17_i3DiX
= let! { __DEFAULT ~ ww_i1jEI
<- $wmemoryUsageInteger
(x7_s3J32
`cast` <Co:3> :: ...) } in
CostLast ww_i1jEI } in
let {
mem5_i3Djo :: CostStream
mem5_i3Djo
= let! { __DEFAULT ~ ww_i1jEI
<- $wmemoryUsageInteger
(x11_s3J3b
`cast` <Co:3> :: ...) } in
CostLast ww_i1jEI } in
join {
$j4_i3Dju
:: ExBudgetStream
-> BuiltinRuntime
(CekValue
DefaultUni
DefaultFun
())
$j4_i3Dju (nt_i3Djv
:: ExBudgetStream)
= let! { __DEFAULT ~ nt1_i3Djw
<- nt_i3Djv } in
BuiltinCostedResult
nt1_i3Djw
(join {
$j5_i3DjF
:: BuiltinResult
(CekValue
DefaultUni
DefaultFun
())
$j5_i3DjF
= case x11_s3J3b
`cast` <Co:3> :: ...
of wild9_i3DjP {
IS x12_i3DjQ ->
case x12_i3DjQ
of {
__DEFAULT ->
let! { __DEFAULT ~ conrep180_i3DjH
<- integerQuot
(x7_s3J32
`cast` <Co:3> :: ...)
wild9_i3DjP } in
let! { UnsafeRefl co12_i3DjN ~
<- unsafeEqualityProof } in
BuiltinSuccess
(VCon
((ValueOf
($WDefaultUniInteger
`cast` <Co:50> :: ...)
conrep180_i3DjH)
`cast` <Co:14> :: ...));
0# ->
case divZeroError
of wild11_00 {
}
};
IP x12_i3DjT ->
let! { __DEFAULT ~ conrep180_i3DjH
<- integerQuot
(x7_s3J32
`cast` <Co:3> :: ...)
wild9_i3DjP } in
let! { UnsafeRefl co12_i3DjN ~
<- unsafeEqualityProof } in
BuiltinSuccess
(VCon
((ValueOf
($WDefaultUniInteger
`cast` <Co:50> :: ...)
conrep180_i3DjH)
`cast` <Co:14> :: ...));
IN x12_i3DjV ->
let! { __DEFAULT ~ conrep180_i3DjH
<- integerQuot
(x7_s3J32
`cast` <Co:3> :: ...)
wild9_i3DjP } in
let! { UnsafeRefl co12_i3DjN ~
<- unsafeEqualityProof } in
BuiltinSuccess
(VCon
((ValueOf
($WDefaultUniInteger
`cast` <Co:50> :: ...)
conrep180_i3DjH)
`cast` <Co:14> :: ...))
} } in
case x11_s3J3b
`cast` <Co:3> :: ...
of {
IS x12_i3DjY ->
case x12_i3DjY of {
__DEFAULT ->
jump $j5_i3DjF;
0# -> lvl92_r3JIJ
};
IP x12_i3Dk4 ->
jump $j5_i3DjF;
IN x12_i3Dk6 ->
jump $j5_i3DjF
}) } in
case runCpu_i3Diz
eta17_i3DiX mem5_i3Djo
of wild9_i3Dk8 {
CostLast bx_i3Dk9 ->
case runMem_i3DiA
eta17_i3DiX mem5_i3Djo
of wild10_i3Dkb {
CostLast bx1_i3Dkc ->
jump $j4_i3Dju
(ExBudgetLast
(ExBudget
bx_i3Dk9
bx1_i3Dkc));
CostCons ipv_i3Dke
ipv1_i3Dkf ->
jump $j4_i3Dju
(zipCostStreamGo
wild9_i3Dk8
wild10_i3Dkb)
};
CostCons ipv_i3Dki
ipv1_i3Dkj ->
jump $j4_i3Dju
(zipCostStreamGo
wild9_i3Dk8
(runMem_i3DiA
eta17_i3DiX
mem5_i3Djo))
}
}
})
}
})) } in
Most of that is pretty cheap, but is still overhead and I really doubt that 15 extra nanoseconds gonna make a difference.
I think this is very low-priority, so I'm going to triage it as such.
Thank you for providing this context.
I doubt we're going to consider investigating this in any foreseeable future. There's so much we can optimize and extra 15ns of builtin performance, requiring not only non-trivial optimizations but also non-trivial costing changes... just not worth it.
I'm therefore closing the issue. Maybe in 20 years someone will find a specific use case where those 15 nanoseconds make a difference.