quanta
quanta copied to clipboard
Comparing to `minstant`
Hi, I'm one of the authors of minstant, and minitrace based on minstant which is a fast tracing library used by TiKV. Similar to quanta, minstant is also based on the TSC. I'm considering migrating minitrace to use quanta but have found some blocking problems, which are:
quantadoesn't handle TSC deviation on the CPU cores. This problem can occur on some AMD chips. Inminstant, the calibration will be executed on every core, and a correction for each core will be calculated once the deviation is detected.- The first call takes some time to calibrate the clock. In
minstant,rust-ctorhelps start the calibration at the start of the process.
Are these two problems possible to be fixed?
Hi there!
I've actually looked at minstant before when checking out minitrace. Cool stuff, always fun to see how close to the metal you can make a thing. :)
I'll start with calibration delay. I wouldn't be against a PR that switches the global calibration to automatically run at process start. This one seems like the easiest to achieve in a cross-platform way given rust-ctor's statement of platform support.
As far as per-CPU core deviation: you're right, quanta doesn't currently per-core TSC offset compensation. That's currently an open-ish issue (#61) after an end user hit a weird bug where one of their cores had a very skewed TSC offset due a BIOS firmware issue.
My main reservation with adding such logic is that it should ideally be cross-platform, so ff the top of my head, we would need:
- cross-platform way of getting list of CPU IDs (looks like
cpu-affinitycrate can do this) - cross-platform way of setting thread/CPU affinity (
cpu-affinitycan also do this)
That gets us a way to do the calibration on a per-CPU basis, although the value of TSC_AUX is still OS-dependent, so we could store per-CPU offsets for Linux, but nothing else.. so we'd probably end up mirroring what minstant has with TSCLevel::Stable vs TSCLevel::PerCPU.
All of this is a long way of saying: I think quanta could address these problems, but my biggest focus with the library is doing things with a cross-platform-first mentality, and avoiding making assumptions around things i.e. that the process has certain permissions to access files, or MSRs, etc.
I'm definitely open to PRs to try and incrementally improve the situation, but I'm not sure if I would personally be able to get these types of changes in a timeframe of anything less than multiple months.
Thanks for your quick reply! I think we can't wait for multiple months. But why will it take that long as you have mentioned that rust-ctor and cpu-affinity are cross-platform?
It wouldn't take multiple months of work, just that given the time I'm able to allocate to my open-source projects and the effort to implement and test these things to satisfactory level would likely end up taking me a few months.
(edited because my original response was unintentionally harsh.)
Oh, I've just misunderstood. I thought you were saying it'll take several months to review the PR.
Definitely, we'd like to make a PR but we've decided to stick with minstant so far because we're going to publish minitrace-rust soon. We'll step back to quanta afterward.
No worries. My response was unintentionally harsh and I've edited it. Sorry about that.
Hopefully, whenever it happens, you/me/we/whoever can work on this stuff and get it into quanta. It would definitely be a nice improvement. 👍🏻
After fixing #82, I checked also minstant (the first number is a count of parallel threads):
std minstant quanta
1: 1187.1±138.65ns 18.5±6.52ns 13.2±6.88ns
2: 1433.9±158.59ns 31.3±9.89ns 16.9±6.11ns
3: 1384.1±71.93ns 30.9±10.81ns 17.2±6.37ns
4: 1407.1±159.88ns 28.7±10.84ns 16.9±5.78ns
5: 1367.0±62.41ns 30.4±11.10ns 17.8±5.71ns
6: 1411.6±167.61ns 30.6±10.93ns 17.3±5.68ns
7: 1396.0±83.37ns 28.9±10.54ns 17.5±6.30ns
8: 1390.6±81.05ns 29.4±11.40ns 17.2±5.62ns
9: 1436.7±113.11ns 29.5±10.48ns 24.9±9.90ns
10: 1399.3±84.62ns 29.1±11.33ns 25.8±9.92ns
11: 1388.3±64.43ns 28.0±10.45ns 25.9±9.36ns
12: 1395.2±60.53ns 25.5±9.21ns 24.0±7.32ns
It doesn't have problems with contention, but numbers differ from benchmarks in minstant's repository for basic cases for my CPU (AMD Ryzen 7 4800HS), that's interesting.
Also, minstant cannot replace quanta (for my case) until any sort of mocks is provided.
Note: minstant::Instant stores raw values while quanta::Instant stores nanoseconds. So, scaling is happened in arithmetic operations in minstant and on now() in quanta. It's hard to say what's better; it depends on the case.
Thus, a better way to measure performance is to measure now() - now().
AMD Ryzen 7 4800HS quanta v0.12.1 (scales twice)
let start = QuantaInstant::now();
QuantaInstant::now() - start
quanta/now/1 time: [18.714 ns 19.059 ns 19.528 ns]
quanta/now/2 time: [26.202 ns 26.982 ns 27.838 ns]
quanta/now/3 time: [25.401 ns 26.261 ns 27.163 ns]
quanta/now/4 time: [24.358 ns 25.239 ns 26.217 ns]
quanta/now/5 time: [25.785 ns 26.618 ns 27.588 ns]
quanta/now/6 time: [24.935 ns 25.700 ns 26.587 ns]
quanta v0.12.1 (scales once, but less ergonomic API):
let start = clock.raw();
clock.delta(start, clock.raw())
quanta-clock/now/1 time: [18.406 ns 18.993 ns 19.716 ns]
quanta-clock/now/2 time: [22.828 ns 23.718 ns 24.671 ns]
quanta-clock/now/3 time: [23.736 ns 24.624 ns 25.561 ns]
quanta-clock/now/4 time: [23.236 ns 24.115 ns 25.102 ns]
quanta-clock/now/5 time: [23.419 ns 24.194 ns 25.026 ns]
quanta-clock/now/6 time: [23.108 ns 23.896 ns 24.817 ns]
minstant 0.1.4 (scales once)
let start = MinstantInstant::now();
MinstantInstant::now() - start
minstant/now/1 time: [11.594 ns 11.958 ns 12.402 ns]
minstant/now/2 time: [15.615 ns 16.226 ns 16.903 ns]
minstant/now/3 time: [14.698 ns 15.304 ns 15.969 ns]
minstant/now/4 time: [16.244 ns 16.552 ns 16.930 ns]
minstant/now/5 time: [14.983 ns 15.534 ns 16.148 ns]
minstant/now/6 time: [15.610 ns 16.146 ns 16.771 ns]
So, now minstant is better in terms of performance.
If talking about functional comparison:
- quanta supports more platforms
- quanta provides mock API and clock overrides
- minstant allows some patterns that aren't supported by quanta
For the last point, let's consider a situation where we just want to check whether some time has elapsed or not. I minstant it's possible to calculate the deadline in cycles:
let deadline = Instant::now() + Duration::from_secs(2) // stores ticks
and then checks against it multiple times:
Instant::now() >= deadline // no scaling at all
In quanta, it seems impossible because there is no time -> ticks conversion, only ticks -> time. Thus, the scaling will happen in any case. Although delta can be used to calculate scale, it's an implementation detail.
That's a good comparison.
I'm not sure we could make a change to defer the scaling arithmetic unless we also included the scaling factor/shift in Instant itself... which might become a no-op performance change due to the increase in size for Instant. minstant has the advantage here of being able to avoid this since it's just dealing with static variables to access the scaling factor, where we're implicitly tied to a specific Clock.
I'm not sure we could make a change to defer the scaling arithmetic
I'm too. For instance, I have both cases: one now() and multiple ops and the opposite. So, I think it should be specified in the documentation.
However, I think quanta should provide access to the current clock to simplify
let start = clock.raw();
clock.delta(start, clock.raw())
Otherwise, it requires manual OnceLock + custom Clock and implementation mocks.
Will you accept ~~quanta::default_clock() -> &'static Clock~~ or quanta::with_default_clock(|clock: &Clock| ...)?