Nim
Nim copied to clipboard
initRand now uses strict monotonic counter to guarantee uniqueness
note
~~the CI failure seems to suggest getMonoTime
is in fact not monotonic on some OS? that's worrysome~~ (EDIT: that's expected; getCpuTicks would be strict monotonic at least on modern cpus)
future work
- [x] std/tempfiles should still not call
initRand()
on each call torandomPathName
but should instead use athreadvar
rand state, maybe (code doesn't look re-entrant) - [ ] docs should clarify whether getMonoTime can return same timestamp in 2 different threads; if so, then initRand should mix getMonoTime() with
getThreadId()
to guarantee uniqueness (EDIT: even within same thread there is no such guarantee because it's not strict monotonic; but getCpuTicks would be strict monotonic at least on modern cpus) - [x] investigate #18158 (marking checkbox because issue has a tracker)
- [ ] make
initRand()
work with--experimental:vmopsDanger
- [ ]
proc mach_absolute_time(): int64 {.importc, header: "<mach/mach.h>".}
in std/monotimes has wrong signature, differing from one in system/timers (see https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time)
Just curious: why use the current time, rather than pulling from /dev/random or /dev/urandom (*nix), or CryptGenRandom (Windows)?
Just curious: why use the current time, rather than pulling from /dev/random or /dev/urandom (*nix), or CryptGenRandom (Windows)?
well we now have std/sysrand instead of having to call those directly, so it's an option, but IIRC performance can be a concern, definitely worth trying though; also, we need a vmops for whichever option is used in the end so that initRand()
can work at CT (minor, easy point). That said, a strict monotonic counter is always a good tool to have, regardless of availability of sysrand
Well then, I propose using the system's source of randomness as a seed. If someone needs, for some odd reason, to initialize thousands of state machines using a faster method, they can do so manually.
Using a monotonic clock, even a strict one, doesn't necessarily guarantee unique values for each call. A random source of data may return the same value consecutively, but that can't be known ahead of time (assuming the system implementation is sound).
PTAL, addressed comment; this PR is an improvement over status quo.
Well then, I propose using the system's source of randomness as a seed. If someone needs, for some odd reason, to initialize thousands of state machines using a faster method, they can do so manually.
the benchmark below shows that using urandom
from std/sysrand is 90x slower than getCpuTicks
(more details later, this should guarantee uniqueness) and 20x slower than getMonoTime
(from this PR) so urandom is not a good default IMO; user can always do that by calling initRand(getSeedFromUrandom)
import std/sysrand
import std/monotimes
import timn/exp/cputicks # cf upcoming PR
template algo1(buf, c) =
let ok = urandom(buf)
doAssert ok
c += cast[int](buf[0].addr)
template algo2(buf, c) =
let t = getCpuTicks()
c += cast[int](t)
template algo3(buf, c) =
let t = getMonoTime()
c += cast[int](t.ticks)
template mainAux(algo)=
let n = 10000
var buf: array[8, byte]
var c = 0
let t1 = getCpuTicks()
for i in 0..<n:
algo(buf, c)
let t2 = getCpuTicks()
echo (astToStr(algo), c, t2 - t1)
proc main()=
for i in 0..<10:
echo()
mainAux(algo1)
mainAux(algo2)
mainAux(algo3)
main()
the benchmark below shows that using
urandom
from std/sysrand is 90x slower thangetCpuTicks
(more details later, this should guarantee uniqueness) and 20x slower thangetMonoTime
(from this PR) so urandom is not a good default IMO; user can always do that by callinginitRand(getSeedFromUrandom)
So this seems to be an argument of performance vs correctness. Either we use a source of cryptographic randomness as the seed, or a high-resolution monotonic timer. Is this accurate?
So this seems to be an argument of performance vs correctness. Either we use a source of cryptographic randomness as the seed, or a high-resolution monotonic timer. Is this accurate?
yes, but i have a PR in the work that will give both performance and correctness; until then this PR is an improvement over status quo
If there's a PR in the works for a proper fix, there's little point in a PR for an improper one, unless there's something time-critical going on.
If there's a PR in the works for a proper fix, there's little point in a PR for an improper one, unless there's something time-critical going on.
with this logic, nothing ever gets done (it's not the 1st time). This PR is good enough to close #17898 given the clock resolution and cost of initRand, and improves several other things if you read the PR content. The PR I have in the work could potentially be controversial, who knows (and the other things in this PR are still useful regardless); there's no point in blocking on it when reusing existing getMonoTime
already improves status quo.
PTAL
This does not fix #17898, it just makes it less likely.
Multiple decisions need to be made here:
- To what degree should initRand prevent two calls in rapid succession from using the same seed?
- Should
randomPathName
only generate names for paths that don't exist?
In my opinion, it's up to the caller of randomPathName
to check that the path doesn't already exist. initRand
, because it isn't typically called very often, can get away with sacrificing performance for uniqueness, however no specific guarantee should be made regarding how unique.
I would also love to know why randomPathName is initializing a new random state each time. If you're going to do that, you might as well just use the current time anyway.
I would also love to know why randomPathName is initializing a new random state each time. If you're going to do that, you might as well just use the current time anyway.
I agree completely, this is not good.
3 possible venues:
- use atomics as done in std/oids which solves a very similar problem; can be augmented with thread-id + skipRandomNumbers to be robust to multiple threads returning same value
- use a {.threadvar.} Rand state
- expose API
getCpuTicks
from RDTSC instruction wherever it's available (refs https://github.com/timotheecour/Nim/issues/773, PR in the works), which is useful for many things (including as replacement for monotimes / cpuTime since it provides much higher resolution and lower overhead that those). This one is useful regardless of its use in this context. It provides guaranteed monotonicity within 1 thread (and by extension multiple threads since we can mix-in threadid; note that there's also an RDTSC instruction that returns the CPU id in addition to the tick). Portability is a potential concern but for benchmarking it's not a problem as it's an optional tool.
This PR may not fix #17898 completely, but it could be a improvement for random module.
@araq PTAL:
- EDIT: depends on https://github.com/nim-lang/Nim/pull/18743 (std/cputicks; see corresponding RFC in https://github.com/nim-lang/RFCs/issues/411); it introduces a new module
std/cputicks
providing cpu instruction-level granularity, with much higher precision and lower overhead than eithertimes.cpuTime
orstd/monotimes
(and less module import depdendencies), in particular strict monotonic instead of monotonic counters; this is what should be used for profiling (I'm using this in https://github.com/nim-lang/Nim/pull/15827 in not-yet-pushed commits which adds full profiling capability) and micro benchmarks (as opposed to looping N times over some code, which can skew results in various ways, affect the cache, register allocation, etc) ; the APIs are available in all backends (c, js, vm) - initRand now works in VM
- initRand now uses strict monotonic counter
- improve monotimes (eg for js etc)
which also addresses the previously raised concerns
example
- on OSX,
getCpuTicks
has 4.5X less overhead thangetMonoTime()
and 71X less overhead thancpuTime()
, see https://gist.github.com/timotheecour/e5d85be09b141b4bf9f877e1c8731025 (-d:case1
); in other OS's, the gap is even larger - on OS's other than OSX,
getMonoTime()
is not strictly monotonic and can't be used in a meaningful way to measure code under a certain number of instructions (see-d:case2
)
future work
- [ ] revisit https://github.com/nim-lang/Nim/pull/18729 by using instead the new
std/cputicks
links
- https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
- https://cpufun.substack.com/p/fun-with-timers-and-cpuid
This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.
It was done differently by https://github.com/nim-lang/Nim/pull/18744