bigints Implement floating point conversion

Fixes Feature request: toBiggestInt #34

Jul 26 '22 22:07 rotu

Tbh, I'm not sure this feature is really worth implementing. Imo we shouldn't just implement every feature request without first thinking about it. Do you have any concrete use case? Is there any precedent in some standard bigint implementation out there?

I've modeled it off of https://nim-lang.org/docs/system.html#toBiggestFloat%2CBiggestInt The major use cases are:

for convenient display
for ballpark estimates when the actual value may or may not be needed with full precision

Jul 27 '22 18:07 rotu

for convenient display

for ballpark estimates when the actual value may or may not be needed with full precision

Can you elaborate? I don't see how conversion to float is needed for that.

Jul 27 '22 21:07 konsumlamm

for convenient display

for ballpark estimates when the actual value may or may not be needed with full precision

Can you elaborate? I don't see how conversion to float is needed for that.

Instead of reimplementing display formatting, float conversion allows you to leverage existing tools: https://nim-lang.org/docs/strformat.html#formatting-floats

Also, floating point allows you to do operations that may not be an integer, such as square roots. While no floating point can represent irrational numbers, an estimate may be good enough.

Jul 28 '22 00:07 rotu

I added a note above on why I actually think toBiggestFloat (the original proposal, changed in latest commit) is a better api. I think it could be interesting to hear the opinion of @narimiran on this matter.

Jul 29 '22 10:07 pietroppeter

for convenient display

for ballpark estimates when the actual value may or may not be needed with full precision

Can you elaborate? I don't see how conversion to float is needed for that.

Instead of reimplementing display formatting, float conversion allows you to leverage existing tools: https://nim-lang.org/docs/strformat.html#formatting-floats

Also, floating point allows you to do operations that may not be an integer, such as square roots. While no floating point can represent irrational numbers, an estimate may be good enough.

Large BigInt values cannot be converted to float64 or they just become inf. Here is example code that returns an estimate as mantissa and exponent part and approximated square root:

import std/[math, strformat, options]
import bigints

proc approximate10(x: BigInt; precision: range[1..18]): tuple[mantissa, exponent: int] =
  # int.high.float.log10 ≒ 18.96
  let
    # https://en.wikipedia.org/wiki/Logarithm#Change_of_base
    log10 = (x.fastLog2.float / 3.321928).int  # log2(10) = 3.321928
    exp = max(0, log10 - precision)

  ((x div 10.initBigInt.pow(exp)).toInt[:int].get, exp)

proc test =
  block:
    let
      a = 10.initBigInt.pow(40)
      (m, e) = a.approximate10(5)

    echo a
    echo &"≒ {m} * 10^{e}"

  block:
    let
      a = 1234567890.initBigInt
      (m, e) = a.approximate10(3)

    echo a
    echo &"≒ {m} * 10^{e}"

test()

proc sqrt(x: BigInt; maxIter: int): BigInt =
  # Quick approximated value
  #result = 1.initBigInt shl (x.fastLog2 div 2)
  #result = x shr (x.fastLog2 div 2 + 1)
  let
    log2 = x.fastLog2
    b2 = (x shr (log2 - 1)).toInt[:int].get and 1
  result = (1.initBigInt shl (log2 div 2)) or ((log2 mod 2).initBigInt shl (log2 div 2 - 2 + b2))

  # Refine value with Newton's method
  # https://en.wikipedia.org/wiki/Newton's_method
  for i in 1..maxIter:
    result = (result * result + x) div (result * 2.initBigInt)

proc testSqrt =
  block:
    var a = (256 + 128).initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

  block:
    var a = 4000000.initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

  block:
    var a = 6000000.initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

  block:
    var a = 8000000.initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

  block:
    var a = 9000000.initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

testSqrt()

Example output:

10000000000000000000000000000000000000000
≒ 1000000 * 10^34
1234567890
≒ 1234 * 10^6
384
16
20
19
19
4000000
1536
2070
2001
2000
6000000
2048
2488
2449
2449
8000000
2048
2977
2832
2828
9000000
2560
3037
3000
3000

This is quick implementation and not much tested. But these algorithm would works for any BigInt value.

Jul 31 '22 08:07 demotomohiro

@demotomohiro,

I think you were trying to demonstrate this PR is unnecessary, but have shown that the opposite.

Note that fastLog2 returns a crude (truncated integer) estimate of the base-2 logarithm. You would have gotten correct formatting had you used a conversion with the float formatting functions.

Also, your Newton iteration converges on the wrong result due to quantization error. sqrt(384) is closer to 20 than 19. Plus I'm willing to bet your hardware has accelerated floating point operations that outperform this by several orders of magnitude.

Jul 31 '22 22:07 rotu

@konsumlamm

Is there any precedent in some standard bigint implementation out there?

Yes. There is. In GMP : https://gmplib.org/manual/Converting-Integers

Function: double mpz_get_d_2exp (signed long int *exp, const mpz_t op) Convert op to a double, truncating if necessary (i.e. rounding towards zero), and returning the exponent separately. The return value is in the range 0.5<=abs(d)<1 and the exponent is stored to *exp. d * 2^exp is the (truncated) op value. If op is zero, the return is 0.0 and 0 is stored to *exp. This is similar to the standard C frexp function (see Normalization Functions in The GNU C Library Reference Manual). Notice that they fixed the type of output to double and the rounding mode towards zero is not necessarily the one used in the rest of the program/by the computer.

Sorry, that is not a bigint implementation of the standard library of a language.

Aug 01 '22 08:08 dlesnoff

@demotomohiro,

I think you were trying to demonstrate this PR is unnecessary, but have shown that the opposite.

Note that fastLog2 returns a crude (truncated integer) estimate of the base-2 logarithm. You would have gotten correct formatting had you used a conversion with the float formatting functions.

Also, your Newton iteration converges on the wrong result due to quantization error. sqrt(384) is closer to 20 than 19. Plus I'm willing to bet your hardware has accelerated floating point operations that outperform this by several orders of magnitude.

I just used fastLog2 because it seems you don't need accurate result and want to display it quickly. Even if float formatting functions in standard library works correctly, not all BigInt can be converted to float. Even if it is small enough to be converted to float, converting to float loses precision. When you need accurate correct formatting, you need to do that without converting to float32/64.

When you need an accurate squre root of BitInt, you cannot convert it to float32/64 as it lose precision. Even if sqrtsd/sqrtpd instruction in SSE runs fast, converting BigInt to float takes time and lose precision. So unless you do more float operations after converting to float, converting to float would not much faster. But people actually do many float operations after converting to float?

I'm not sure this feature is really needed or not. But if you just want to display it nicely or get squre root, I think implementing it on bigints is better than converting it to float. Because that should works on any values and you can get accurate result if you want.

Highest exponent of float32/64 is 2^127 / 2^1023. When you convert BitInt to float32, it must be less than 2^128. But when 128bits int is large enough for you, I think you would better to use fixed size int128 (https://github.com/status-im/nim-stint or https://github.com/nim-lang/Nim/blob/devel/compiler/int128.nim) as it is stack based and efficient than BigInt. In case you need BigInt as 256bits int is too small but want to convert it to float64, the value must be smaller than 2^1024.

I think this PR is necessary if there are use cases like, you need a large int more than 256bits but final result is smaller than 2^1023, and you need to write it in specific format that have only float64 or pass it to libraries or API that takes only float64. But I'm not sure if such an use case actually exists.

Aug 02 '22 07:08 demotomohiro

@demotomohiro Yes, as soon as you put a value into a float, you are conceding exactness. If you know the result is needed with full precision and that every intermediate value is an integer, then yes, an integer is more appropriate than a floating point number. If integers work for your use-case, use them!

Optimization is premature here, but conversion to a float is a CONSTANT time operation --- it's faster than a bigint addition, and certainly faster than your Newton iteration.

@dlesnoff Another precedent is Python. The int type in Python is arbitrary precision and it can be explicitly converted to float, which is a 64-bit floating point number. It is also implicitly converted when using non-integer-valued operations like /.

Aug 02 '22 18:08 rotu

@rotu This approximated BigInt sqrt calculates square root by taking highest 62 bits, convert it to float and use math.sqrt. It is constant time as it reads only highest 62 bits. And it works for values larger than 2^1023. https://gist.github.com/demotomohiro/704e6110919cf163de18ab36b8bbef4c

Aug 11 '22 14:08 demotomohiro

@rotu This approximated BigInt sqrt calculates square root by taking highest 62 bits, convert it to float and use math.sqrt. It is constant time as it reads only highest 62 bits. And it works for values larger than 2^1023. https://gist.github.com/demotomohiro/704e6110919cf163de18ab36b8bbef4c

Do you really want to write and check this sort of code for every non-arithmetic function you might use?

Furthermore, this implementation betrays its inaccuracy. You have slight accuracy issues when the square root requires looking at more digits than are in the significand. And returning the result as a bigint falsely implies the result is precise down to the unit.

The correct approach for an "exact" square root of a bigint x should be (IMO) a pair (r, d) such that r^2 <= x, x < (r+1)^2, and r^2+d =x. (this is by analogy with the correct approach to integer division: a quotient remainder pair)

At any rate, while I personally have written code that requires values greater than 2^64, I've never had to use values larger than 2^1023!

My PR is only geared toward the approach where you need an approximate answer - that's why I make no attempt to introduce a floating-point to bigint conversion.

Aug 11 '22 21:08 rotu

Furthermore, this implementation betrays its inaccuracy. You have slight accuracy issues when the square root requires looking at more digits than are in the significand. And returning the result as a bigint falsely implies the result is precise down to the unit.

IEEE_754 64bit float has 53bit significand bits. When the input value is smaller than 2^(53*2), my implementation can lose precision as it convert the result to integer. But when the input value is larger than that, it should be as accurate as converting BigInt to float and call math.sqrt. It takes highest 62 bits that is larger than 64bit floats significand bits.

The correct approach for an "exact" square root of a bigint x should be (IMO) a pair (r, d) such that r^2 <= x, x < (r+1)^2, and r^2+d =x. (this is by analogy with the correct approach to integer division: a quotient remainder pair)

GMP has mpz_sqrt and mpz_sqrtrem. mpz_sqrt returns the truncated integer part of the square root of the input. mpz_sqrtrem returns the square root of the input and remainder. https://gmplib.org/manual/Integer-Roots

At any rate, while I personally have written code that requires values greater than 2^64, I've never had to use values larger than 2^1023!

If you don't use values larger than 2^1023, fixed size int types like https://github.com/status-im/nim-stint might be better than bigints. BigInt type uses 4 * 8 bytes in addition to space that hold the int value in order to it can have variable precision integer. seq uses 3 * 8 bytes for len, p, and cap. As BigInt has isNegative, that adds 1 bytes + 7 bytes of padding. Fixed size int uses all bits for the int value like Nim's int type. For example, when BigInt has value 2^511 - 1, it uses 512 + 4 * 8 * 8 = 768 bits. It is 1.5 times larger than Stint[512]. When BigInt has value 2^1023 - 1, it uses 1024 + 256 = 1280 bits. It is 1.25 times larger than Stint[1024].

Aug 14 '22 19:08 demotomohiro

But when the input value is larger than that, it should be as accurate as converting BigInt to float and call math.sqrt.

Yes. This is NOT necessarily correct down to the unit. Returning the value as a BigInt incorrectly suggests it is! Such errors are expected when you're knowingly working with floating point numbers, which are approximate by nature.

mpz_sqrtrem returns the square root of the input and remainder.

I love when I have the same idea as smarter people than myself!

If you don't use values larger than 2^1023, fixed size int types like https://github.com/status-im/nim-stint might be better than bigints.

True. But this is the 'bigints' repo. I'll defer to the judgement of package maintainers whether this functionality is appropriate here.

Aug 14 '22 21:08 rotu