bigints Add and test initRandomBigInt for issue #101

I added random as std dependency. It is now possible to generate a BigInt which is exactly nbits long. I also added a file for tests specific for probabilistic tests.

Aug 10 '22 16:08 dlesnoff

I added an enum type. You may find a less generic name than the one I chose. Another option would be to do two procedures for the initialization of random Bigints : initRandomBigintLimb and initRandomBigintBits I prefer to use an enum type, that's much less to type. I kept the init prefix to differentiate with a proc that would change an already initialized variable. I don't know if that would be useful, I have not implemented that. If we don't do that in the end, we may drop the init prefix -> randomBigint is shorter and much easier to memorize.

All these changes will be really useful for another benchmarking PR I am working on. It also closes issue #101.

Aug 12 '22 12:08 dlesnoff

Let me start this by saying that I don't think that this should be part of the public API.

The proposed function seems to be designed specifically for testing. For this purpose, it looks good to me and I'd be happy to use it for some randomized tests.

However, there are a number of problems for the general use case:

It uses the std/random RNG, so users can't simply use their favourite RNG (afaik there is no standard RNG interface for Nim, even std/sysrand and std/mersenne have different APIs).
The implementation uses rand, which uses the default RNG and is not thread-safe.
It only supports the creation of BigInts of a specific bit/limb size. I imagine most users would want to generate a BigInt between a and b instead.
The current API has no support for any special distributions.
Limb sizes are kind of an implementation detail and I doubt that most users care about them.

It also closes issue #101.

A small note: You need to write something like "closes #101" (in the description or a comment, not the title) so that GitHub actually closes the issue when merging the PR.

Aug 15 '22 09:08 konsumlamm

Thank you for this review ! Indeed, this should be removed from the public API, but the actual export system does not enable to export non-star procs to other modules in the library. We would not be able to use the proc outside of bigints.nim.

The implementation uses rand, which uses the default RNG and is not thread-safe.

That's important to keep in mind for the release of Nim v2, which is promised to be threads on by default.

It only supports the creation of BigInts of a specific bit/limb size. I imagine most users would want to generate a BigInt between a and b instead.

I totally agree with you on that. Actually, I think this can be implemented on top of this current random function and some rejection sampling.

I do not think I should close the issue now, as more random functions for testing have to be added. mratsim proposed two other test functions.

Aug 15 '22 18:08 dlesnoff

Indeed, this should be removed from the public API, but the actual export system does not enable to export non-star procs to other modules in the library. We would not be able to use the proc outside of bigints.nim.

The easiest solution is to simply move the random stuff out of bigints.nim (it doesn't need access to anything private, you can construct a BigInt from a seq[uint32] via initBigInt). This would ofc mean that it can't be used in the bigints module itself, but perhaps it would be better to put a future public random API into its own module anyway?

Aug 16 '22 01:08 konsumlamm

I need the function both in the benchmarks and in the tests which are in two separated folders. I will need to export it anyway.

it doesn't need access to anything private

Currently, it does. I am not sure if I can avoid private membership access.

We make the function slower by proceding in two steps, and takes twice as much memory (since the same seq is created twice, one for the generation of the random number and one for the initialisation of bigint). The fact that the random generation time is negligible compared to the multiplication/division of two integers is crucial for the benchmarks. I am okay with it for now.

I think the code needs refactoring anyway: everything is stored in a single monolithic file. We could very well split bigints.nim into multiple nim files and include them all in bigints.nim. We should separate bigint initialisation, representation (toString), basic operations on it: +, -, *, /, shr, shl and advanced operations (operations that rely on simpler bigint operations): pow, powmod, gcd With the include system, it acts just like the same as the current single file. All the splitted files can go in a subdirectory, like the already existing src/bigints If we want to do a file with helper functions that doesn't go into the public API, how should I call the file ? helper.nim ? Are there any other functions that would go into it ?

Aug 16 '22 09:08 dlesnoff

I need the function both in the benchmarks and in the tests which are in two separated folders. I will need to export it anyway.

It can just be a separate module though (e.g. in tests/ or a new folder).

it doesn't need access to anything private

Currently, it does. I am not sure if I can avoid private membership access.

We make the function slower by proceding in two steps, and takes twice as much memory (since the same seq is created twice, one for the generation of the random number and one for the initialisation of bigint). The fact that the random generation time is negligible compared to the multiplication/division of two integers is crucial for the benchmarks. I am okay with it for now.

It shouldn't copy the seq, at least with ARC/ORC, since it is a sink parameter.

I think the code needs refactoring anyway: everything is stored in a single monolithic file. We could very well split bigints.nim into multiple nim files and include them all in bigints.nim. We should separate bigint initialisation, representation (toString), basic operations on it: +, -, *, /, shr, shl and advanced operations (operations that rely on simpler bigint operations): pow, powmod, gcd With the include system, it acts just like the same as the current single file. All the splitted files can go in a subdirectory, like the already existing src/bigints

Agreed.

Aug 16 '22 11:08 konsumlamm

@konsumlamm I have moved away the new function initRandomBigInt in a new file in bigints folder called utilities.nim, as requested. Can you validate the changes ? I have not checked whether it copies in place, but it should.

Sep 04 '22 17:09 dlesnoff

How about to add rnd: var Rand parameter to initRandomBigInt and randomizeBigInt procs, and generate random numbers using it? Doing so makes these procedures thread safe.

Sep 22 '22 22:09 demotomohiro

I did my own implementation (below), before I realized you already wrote one.

The size of a limb is an implementation detail, not a feature. RandomMode = Limbs should not be exposed.
I think the rand(lo..hi) is more in line with existing APIs and more intuitive.

For optimization purposes, if it's undesirable to construct ancillary BigInts to specify limits, it might make sense to have e.g. specialized types for arbitrary powers of two which just store an exponent and a sign (so you can do myBigInt -= fastPow2(900) as well as rand(0'bi..fastPow2(1000)).

import std/random
import bigints
import std/sugar
import std/options

proc rand(r: var Rand, x: Slice[BigInt]): BigInt =
  assert(x.a <= x.b, "empty slice")
  let spread = x.b - x.a
  if spread == 0'bi:
    return x.a
  let nbits = spread.fastLog2
  while true:
    let fulllimbs = nbits div 32
    var digits = collect:
      for i in 0 ..< fulllimbs:
        r.rand(uint32.low..uint32.high)
    let maxHighDigit = (spread shr (fulllimbs*32)).toInt[:uint32].get()
    digits.add(r.rand(uint32.low..maxHighDigit))
    result = initBigInt(digits)
    if result <= spread:
      break
  result += x.a

proc rand(x: Slice[BigInt]): BigInt = rand(randState(), x)

Oct 04 '22 18:10 rotu

@rotu

I did my own implementation (below), before I realized you already wrote one.

No problem here, it is always good to have a comparison.

The size of a limb is an implementation detail, not a feature. RandomMode = Limbs should not be exposed.

You missed my code motivation here. I wanted to have a helper function for the library development and unit testing first and foremost. I need to generate both, there are slight variations in the generation between the two, and I don't expose it in the public API (unless there is an error).

I think the rand(lo..hi) is more in line with existing APIs and more intuitive. This second way looked more difficult to generate in an uniformly distributed manner. Otherwise, it is indeed more untuitive (slices doesn't support steps though).

Quick comments on the code: Don't use std/sugar. It is always possible to do without. Just importing it, not even using it, slows down the compilation quite a bit. We don't want it, at least in the library, maybe for helper functions.

Why do you move out the conditional from the while loop ?

while true:
  ...
  if result <= spread:
    break

I still have to look closely, but the code looks simpler than mine, and I have carefully tried many problematic edge cases.

@demotomohiro I forgot! Glad to learn about thread safety, I have an applied mathematics degree and don't know much yet about thread safety. Sorry, but I have very few time to spare on the library.

Finally, I think that the function should have at least some test for the shape of sample's distribution (to check if its uniform). One way to do this would be to take the remainder of each generated bigint by a set of small integers (less than a limb, or just a dozen) and to look if this is uniform.

Oct 05 '22 22:10 dlesnoff

@dlesnoff

You terribly missed my code motivation here. I wanted to have a helper function for the library development and unit testing first and foremost. I need to generate both, there are slight variations in the generation between the two, and I don't expose it in the public API (unless there is an error).

Aha! Yes, I did miss your intention :-).

Quick comments on the code: Don't use std/sugar. It is always possible to do without. Just importing it, not even using it, slows down the compilation quite a bit. We don't want it, at least in the library, maybe for helper functions.

I've since discovered newSeqWith.

Why do you move out the conditional from the while loop ?

I'm not sure I understand the question. I would use a do-while loop, but Nim doesn't support them.

The loop body generates a uniformly distributed random number in the range 0..limit, where limit is a number slightly above spread (in particular, limit has the same first limb as spread and all the other limbs equal to uint32.high). Then I postselect to ensure the chosen value is actually less than spread without breaking uniformity.

import bigints
import std/sequtils
import std/options
import std/random

proc rand(r: var Rand, x: Slice[BigInt]): BigInt =
  assert(x.a <= x.b, "empty slice")
  let spread = x.b - x.a
  if spread == 0'bi:
    return x.a
  let
    nbits = spread.fastLog2
    nFullLimbs = nbits div 32
    maxHighDigit = (spread shr (nFullLimbs*32)).toInt[:uint32].get()
  while true:
    var digits = newSeqWith(nFullLimbs, r.rand(uint32.low..uint32.high))
    digits.add(r.rand(uint32.low..maxHighDigit))
    result = initBigInt(digits)
    if result <= spread:
      break
  result += x.a

proc rand(x: Slice[BigInt]): BigInt = rand(randState(), x)

Oct 06 '22 03:10 rotu

Oh, I see. First, excuse me, my message was rude.

I think your code would actually be a great addition to the lib. I thought the slice would be harder, but like you said, you just use rejection sampling.

If you do another PR with a bit of testing and make your function public, I'll review and approve it.

Oct 06 '22 21:10 dlesnoff