Nim
Nim copied to clipboard
initRand now uses strict monotonic counter to guarantee uniqueness
note
~~the CI failure seems to suggest getMonoTime is in fact not monotonic on some OS? that's worrysome~~ (EDIT: that's expected; getCpuTicks would be strict monotonic at least on modern cpus)
future work
- [x] std/tempfiles should still not call
initRand()on each call torandomPathNamebut should instead use athreadvarrand state, maybe (code doesn't look re-entrant) - [ ] docs should clarify whether getMonoTime can return same timestamp in 2 different threads; if so, then initRand should mix getMonoTime() with
getThreadId()to guarantee uniqueness (EDIT: even within same thread there is no such guarantee because it's not strict monotonic; but getCpuTicks would be strict monotonic at least on modern cpus) - [x] investigate #18158 (marking checkbox because issue has a tracker)
- [ ] make
initRand()work with--experimental:vmopsDanger - [ ]
proc mach_absolute_time(): int64 {.importc, header: "<mach/mach.h>".}in std/monotimes has wrong signature, differing from one in system/timers (see https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time)
Just curious: why use the current time, rather than pulling from /dev/random or /dev/urandom (*nix), or CryptGenRandom (Windows)?
Just curious: why use the current time, rather than pulling from /dev/random or /dev/urandom (*nix), or CryptGenRandom (Windows)?
well we now have std/sysrand instead of having to call those directly, so it's an option, but IIRC performance can be a concern, definitely worth trying though; also, we need a vmops for whichever option is used in the end so that initRand() can work at CT (minor, easy point). That said, a strict monotonic counter is always a good tool to have, regardless of availability of sysrand
Well then, I propose using the system's source of randomness as a seed. If someone needs, for some odd reason, to initialize thousands of state machines using a faster method, they can do so manually.
Using a monotonic clock, even a strict one, doesn't necessarily guarantee unique values for each call. A random source of data may return the same value consecutively, but that can't be known ahead of time (assuming the system implementation is sound).
PTAL, addressed comment; this PR is an improvement over status quo.
Well then, I propose using the system's source of randomness as a seed. If someone needs, for some odd reason, to initialize thousands of state machines using a faster method, they can do so manually.
the benchmark below shows that using urandom from std/sysrand is 90x slower than getCpuTicks (more details later, this should guarantee uniqueness) and 20x slower than getMonoTime (from this PR) so urandom is not a good default IMO; user can always do that by calling initRand(getSeedFromUrandom)
import std/sysrand
import std/monotimes
import timn/exp/cputicks # cf upcoming PR
template algo1(buf, c) =
let ok = urandom(buf)
doAssert ok
c += cast[int](buf[0].addr)
template algo2(buf, c) =
let t = getCpuTicks()
c += cast[int](t)
template algo3(buf, c) =
let t = getMonoTime()
c += cast[int](t.ticks)
template mainAux(algo)=
let n = 10000
var buf: array[8, byte]
var c = 0
let t1 = getCpuTicks()
for i in 0..<n:
algo(buf, c)
let t2 = getCpuTicks()
echo (astToStr(algo), c, t2 - t1)
proc main()=
for i in 0..<10:
echo()
mainAux(algo1)
mainAux(algo2)
mainAux(algo3)
main()
the benchmark below shows that using
urandomfrom std/sysrand is 90x slower thangetCpuTicks(more details later, this should guarantee uniqueness) and 20x slower thangetMonoTime(from this PR) so urandom is not a good default IMO; user can always do that by callinginitRand(getSeedFromUrandom)
So this seems to be an argument of performance vs correctness. Either we use a source of cryptographic randomness as the seed, or a high-resolution monotonic timer. Is this accurate?
So this seems to be an argument of performance vs correctness. Either we use a source of cryptographic randomness as the seed, or a high-resolution monotonic timer. Is this accurate?
yes, but i have a PR in the work that will give both performance and correctness; until then this PR is an improvement over status quo
If there's a PR in the works for a proper fix, there's little point in a PR for an improper one, unless there's something time-critical going on.
If there's a PR in the works for a proper fix, there's little point in a PR for an improper one, unless there's something time-critical going on.
with this logic, nothing ever gets done (it's not the 1st time). This PR is good enough to close #17898 given the clock resolution and cost of initRand, and improves several other things if you read the PR content. The PR I have in the work could potentially be controversial, who knows (and the other things in this PR are still useful regardless); there's no point in blocking on it when reusing existing getMonoTime already improves status quo.
PTAL
This does not fix #17898, it just makes it less likely.
Multiple decisions need to be made here:
- To what degree should initRand prevent two calls in rapid succession from using the same seed?
- Should
randomPathNameonly generate names for paths that don't exist?
In my opinion, it's up to the caller of randomPathName to check that the path doesn't already exist. initRand, because it isn't typically called very often, can get away with sacrificing performance for uniqueness, however no specific guarantee should be made regarding how unique.
I would also love to know why randomPathName is initializing a new random state each time. If you're going to do that, you might as well just use the current time anyway.
I would also love to know why randomPathName is initializing a new random state each time. If you're going to do that, you might as well just use the current time anyway.
I agree completely, this is not good.
3 possible venues:
- use atomics as done in std/oids which solves a very similar problem; can be augmented with thread-id + skipRandomNumbers to be robust to multiple threads returning same value
- use a {.threadvar.} Rand state
- expose API
getCpuTicksfrom RDTSC instruction wherever it's available (refs https://github.com/timotheecour/Nim/issues/773, PR in the works), which is useful for many things (including as replacement for monotimes / cpuTime since it provides much higher resolution and lower overhead that those). This one is useful regardless of its use in this context. It provides guaranteed monotonicity within 1 thread (and by extension multiple threads since we can mix-in threadid; note that there's also an RDTSC instruction that returns the CPU id in addition to the tick). Portability is a potential concern but for benchmarking it's not a problem as it's an optional tool.
This PR may not fix #17898 completely, but it could be a improvement for random module.
@araq PTAL:
- EDIT: depends on https://github.com/nim-lang/Nim/pull/18743 (std/cputicks; see corresponding RFC in https://github.com/nim-lang/RFCs/issues/411); it introduces a new module
std/cputicksproviding cpu instruction-level granularity, with much higher precision and lower overhead than eithertimes.cpuTimeorstd/monotimes(and less module import depdendencies), in particular strict monotonic instead of monotonic counters; this is what should be used for profiling (I'm using this in https://github.com/nim-lang/Nim/pull/15827 in not-yet-pushed commits which adds full profiling capability) and micro benchmarks (as opposed to looping N times over some code, which can skew results in various ways, affect the cache, register allocation, etc) ; the APIs are available in all backends (c, js, vm) - initRand now works in VM
- initRand now uses strict monotonic counter
- improve monotimes (eg for js etc)
which also addresses the previously raised concerns
example
- on OSX,
getCpuTickshas 4.5X less overhead thangetMonoTime()and 71X less overhead thancpuTime(), see https://gist.github.com/timotheecour/e5d85be09b141b4bf9f877e1c8731025 (-d:case1); in other OS's, the gap is even larger - on OS's other than OSX,
getMonoTime()is not strictly monotonic and can't be used in a meaningful way to measure code under a certain number of instructions (see-d:case2)
future work
- [ ] revisit https://github.com/nim-lang/Nim/pull/18729 by using instead the new
std/cputicks
links
- https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
- https://cpufun.substack.com/p/fun-with-timers-and-cpuid
This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.
It was done differently by https://github.com/nim-lang/Nim/pull/18744