Nim icon indicating copy to clipboard operation
Nim copied to clipboard

initRand now uses strict monotonic counter to guarantee uniqueness

Open timotheecour opened this issue 3 years ago • 16 comments

note

~~the CI failure seems to suggest getMonoTime is in fact not monotonic on some OS? that's worrysome~~ (EDIT: that's expected; getCpuTicks would be strict monotonic at least on modern cpus)

future work

  • [x] std/tempfiles should still not call initRand() on each call to randomPathName but should instead use a threadvar rand state, maybe (code doesn't look re-entrant)
  • [ ] docs should clarify whether getMonoTime can return same timestamp in 2 different threads; if so, then initRand should mix getMonoTime() with getThreadId() to guarantee uniqueness (EDIT: even within same thread there is no such guarantee because it's not strict monotonic; but getCpuTicks would be strict monotonic at least on modern cpus)
  • [x] investigate #18158 (marking checkbox because issue has a tracker)
  • [ ] make initRand() work with --experimental:vmopsDanger
  • [ ] proc mach_absolute_time(): int64 {.importc, header: "<mach/mach.h>".} in std/monotimes has wrong signature, differing from one in system/timers (see https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time)

timotheecour avatar Jun 02 '21 05:06 timotheecour

Just curious: why use the current time, rather than pulling from /dev/random or /dev/urandom (*nix), or CryptGenRandom (Windows)?

Varriount avatar Jun 03 '21 00:06 Varriount

Just curious: why use the current time, rather than pulling from /dev/random or /dev/urandom (*nix), or CryptGenRandom (Windows)?

well we now have std/sysrand instead of having to call those directly, so it's an option, but IIRC performance can be a concern, definitely worth trying though; also, we need a vmops for whichever option is used in the end so that initRand() can work at CT (minor, easy point). That said, a strict monotonic counter is always a good tool to have, regardless of availability of sysrand

timotheecour avatar Jun 03 '21 00:06 timotheecour

Well then, I propose using the system's source of randomness as a seed. If someone needs, for some odd reason, to initialize thousands of state machines using a faster method, they can do so manually.

Using a monotonic clock, even a strict one, doesn't necessarily guarantee unique values for each call. A random source of data may return the same value consecutively, but that can't be known ahead of time (assuming the system implementation is sound).

Varriount avatar Jun 03 '21 08:06 Varriount

PTAL, addressed comment; this PR is an improvement over status quo.

Well then, I propose using the system's source of randomness as a seed. If someone needs, for some odd reason, to initialize thousands of state machines using a faster method, they can do so manually.

the benchmark below shows that using urandom from std/sysrand is 90x slower than getCpuTicks (more details later, this should guarantee uniqueness) and 20x slower than getMonoTime (from this PR) so urandom is not a good default IMO; user can always do that by calling initRand(getSeedFromUrandom)

import std/sysrand
import std/monotimes
import timn/exp/cputicks # cf upcoming PR

template algo1(buf, c) =
  let ok = urandom(buf)
  doAssert ok
  c += cast[int](buf[0].addr)

template algo2(buf, c) =
  let t = getCpuTicks()
  c += cast[int](t)

template algo3(buf, c) =
  let t = getMonoTime()
  c += cast[int](t.ticks)

template mainAux(algo)=
  let n = 10000
  var buf: array[8, byte]
  var c = 0
  let t1 = getCpuTicks()
  for i in 0..<n:
    algo(buf, c)
  let t2 = getCpuTicks()
  echo (astToStr(algo), c, t2 - t1)

proc main()=
  for i in 0..<10:
    echo()
    mainAux(algo1)
    mainAux(algo2)
    mainAux(algo3)
main()

timotheecour avatar Jun 03 '21 20:06 timotheecour

the benchmark below shows that using urandom from std/sysrand is 90x slower than getCpuTicks (more details later, this should guarantee uniqueness) and 20x slower than getMonoTime (from this PR) so urandom is not a good default IMO; user can always do that by calling initRand(getSeedFromUrandom)

So this seems to be an argument of performance vs correctness. Either we use a source of cryptographic randomness as the seed, or a high-resolution monotonic timer. Is this accurate?

Varriount avatar Jun 04 '21 00:06 Varriount

So this seems to be an argument of performance vs correctness. Either we use a source of cryptographic randomness as the seed, or a high-resolution monotonic timer. Is this accurate?

yes, but i have a PR in the work that will give both performance and correctness; until then this PR is an improvement over status quo

timotheecour avatar Jun 04 '21 00:06 timotheecour

If there's a PR in the works for a proper fix, there's little point in a PR for an improper one, unless there's something time-critical going on.

Varriount avatar Jun 04 '21 00:06 Varriount

If there's a PR in the works for a proper fix, there's little point in a PR for an improper one, unless there's something time-critical going on.

with this logic, nothing ever gets done (it's not the 1st time). This PR is good enough to close #17898 given the clock resolution and cost of initRand, and improves several other things if you read the PR content. The PR I have in the work could potentially be controversial, who knows (and the other things in this PR are still useful regardless); there's no point in blocking on it when reusing existing getMonoTime already improves status quo.

timotheecour avatar Jun 04 '21 00:06 timotheecour

PTAL

timotheecour avatar Jun 06 '21 08:06 timotheecour

This does not fix #17898, it just makes it less likely.

Multiple decisions need to be made here:

  • To what degree should initRand prevent two calls in rapid succession from using the same seed?
  • Should randomPathName only generate names for paths that don't exist?

In my opinion, it's up to the caller of randomPathName to check that the path doesn't already exist. initRand, because it isn't typically called very often, can get away with sacrificing performance for uniqueness, however no specific guarantee should be made regarding how unique.

Varriount avatar Jun 06 '21 09:06 Varriount

I would also love to know why randomPathName is initializing a new random state each time. If you're going to do that, you might as well just use the current time anyway.

Varriount avatar Jun 11 '21 06:06 Varriount

I would also love to know why randomPathName is initializing a new random state each time. If you're going to do that, you might as well just use the current time anyway.

I agree completely, this is not good.

Araq avatar Jun 11 '21 06:06 Araq

3 possible venues:

  • use atomics as done in std/oids which solves a very similar problem; can be augmented with thread-id + skipRandomNumbers to be robust to multiple threads returning same value
  • use a {.threadvar.} Rand state
  • expose API getCpuTicks from RDTSC instruction wherever it's available (refs https://github.com/timotheecour/Nim/issues/773, PR in the works), which is useful for many things (including as replacement for monotimes / cpuTime since it provides much higher resolution and lower overhead that those). This one is useful regardless of its use in this context. It provides guaranteed monotonicity within 1 thread (and by extension multiple threads since we can mix-in threadid; note that there's also an RDTSC instruction that returns the CPU id in addition to the tick). Portability is a potential concern but for benchmarking it's not a problem as it's an optional tool.

timotheecour avatar Jul 01 '21 07:07 timotheecour

This PR may not fix #17898 completely, but it could be a improvement for random module.

ringabout avatar Aug 22 '21 04:08 ringabout

@araq PTAL:

  • EDIT: depends on https://github.com/nim-lang/Nim/pull/18743 (std/cputicks; see corresponding RFC in https://github.com/nim-lang/RFCs/issues/411); it introduces a new module std/cputicks providing cpu instruction-level granularity, with much higher precision and lower overhead than either times.cpuTime or std/monotimes (and less module import depdendencies), in particular strict monotonic instead of monotonic counters; this is what should be used for profiling (I'm using this in https://github.com/nim-lang/Nim/pull/15827 in not-yet-pushed commits which adds full profiling capability) and micro benchmarks (as opposed to looping N times over some code, which can skew results in various ways, affect the cache, register allocation, etc) ; the APIs are available in all backends (c, js, vm)
  • initRand now works in VM
  • initRand now uses strict monotonic counter
  • improve monotimes (eg for js etc)

which also addresses the previously raised concerns

example

  • on OSX, getCpuTicks has 4.5X less overhead than getMonoTime() and 71X less overhead than cpuTime(), see https://gist.github.com/timotheecour/e5d85be09b141b4bf9f877e1c8731025 (-d:case1); in other OS's, the gap is even larger
  • on OS's other than OSX, getMonoTime() is not strictly monotonic and can't be used in a meaningful way to measure code under a certain number of instructions (see -d:case2)

future work

  • [ ] revisit https://github.com/nim-lang/Nim/pull/18729 by using instead the new std/cputicks

links

  • https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
  • https://cpufun.substack.com/p/fun-with-timers-and-cpuid

timotheecour avatar Aug 23 '21 23:08 timotheecour

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

stale[bot] avatar Sep 08 '22 23:09 stale[bot]

It was done differently by https://github.com/nim-lang/Nim/pull/18744

ringabout avatar Sep 23 '22 02:09 ringabout