quanta Comparing to `minstant`

Hi, I'm one of the authors of minstant, and minitrace based on minstant which is a fast tracing library used by TiKV. Similar to quanta, minstant is also based on the TSC. I'm considering migrating minitrace to use quanta but have found some blocking problems, which are:

quanta doesn't handle TSC deviation on the CPU cores. This problem can occur on some AMD chips. In minstant, the calibration will be executed on every core, and a correction for each core will be calculated once the deviation is detected.
The first call takes some time to calibrate the clock. In minstant, rust-ctor helps start the calibration at the start of the process.

Are these two problems possible to be fixed?

Jan 05 '22 19:01 andylokandy

Hi there!

I've actually looked at minstant before when checking out minitrace. Cool stuff, always fun to see how close to the metal you can make a thing. :)

I'll start with calibration delay. I wouldn't be against a PR that switches the global calibration to automatically run at process start. This one seems like the easiest to achieve in a cross-platform way given rust-ctor's statement of platform support.

As far as per-CPU core deviation: you're right, quanta doesn't currently per-core TSC offset compensation. That's currently an open-ish issue (#61) after an end user hit a weird bug where one of their cores had a very skewed TSC offset due a BIOS firmware issue.

My main reservation with adding such logic is that it should ideally be cross-platform, so ff the top of my head, we would need:

cross-platform way of getting list of CPU IDs (looks like cpu-affinity crate can do this)
cross-platform way of setting thread/CPU affinity (cpu-affinity can also do this)

That gets us a way to do the calibration on a per-CPU basis, although the value of TSC_AUX is still OS-dependent, so we could store per-CPU offsets for Linux, but nothing else.. so we'd probably end up mirroring what minstant has with TSCLevel::Stable vs TSCLevel::PerCPU.

All of this is a long way of saying: I think quanta could address these problems, but my biggest focus with the library is doing things with a cross-platform-first mentality, and avoiding making assumptions around things i.e. that the process has certain permissions to access files, or MSRs, etc.

I'm definitely open to PRs to try and incrementally improve the situation, but I'm not sure if I would personally be able to get these types of changes in a timeframe of anything less than multiple months.

Jan 05 '22 20:01 tobz

Thanks for your quick reply! I think we can't wait for multiple months. But why will it take that long as you have mentioned that rust-ctor and cpu-affinity are cross-platform?

Jan 06 '22 06:01 andylokandy

It wouldn't take multiple months of work, just that given the time I'm able to allocate to my open-source projects and the effort to implement and test these things to satisfactory level would likely end up taking me a few months.

(edited because my original response was unintentionally harsh.)

Jan 06 '22 13:01 tobz

Oh, I've just misunderstood. I thought you were saying it'll take several months to review the PR.

Definitely, we'd like to make a PR but we've decided to stick with minstant so far because we're going to publish minitrace-rust soon. We'll step back to quanta afterward.

Jan 06 '22 13:01 andylokandy

No worries. My response was unintentionally harsh and I've edited it. Sorry about that.

Hopefully, whenever it happens, you/me/we/whoever can work on this stuff and get it into quanta. It would definitely be a nice improvement. 👍🏻

Jan 06 '22 13:01 tobz

After fixing #82, I checked also minstant (the first number is a count of parallel threads):

        std             minstant       quanta
1:  1187.1±138.65ns   18.5±6.52ns    13.2±6.88ns 
2:  1433.9±158.59ns   31.3±9.89ns    16.9±6.11ns 
3:  1384.1±71.93ns    30.9±10.81ns   17.2±6.37ns 
4:  1407.1±159.88ns   28.7±10.84ns   16.9±5.78ns 
5:  1367.0±62.41ns    30.4±11.10ns   17.8±5.71ns 
6:  1411.6±167.61ns   30.6±10.93ns   17.3±5.68ns 
7:  1396.0±83.37ns    28.9±10.54ns   17.5±6.30ns 
8:  1390.6±81.05ns    29.4±11.40ns   17.2±5.62ns 
9:  1436.7±113.11ns   29.5±10.48ns   24.9±9.90ns 
10: 1399.3±84.62ns    29.1±11.33ns   25.8±9.92ns 
11: 1388.3±64.43ns    28.0±10.45ns   25.9±9.36ns 
12: 1395.2±60.53ns    25.5±9.21ns    24.0±7.32ns

It doesn't have problems with contention, but numbers differ from benchmarks in minstant's repository for basic cases for my CPU (AMD Ryzen 7 4800HS), that's interesting.

Also, minstant cannot replace quanta (for my case) until any sort of mocks is provided.

Mar 21 '23 06:03 loyd

Note: minstant::Instant stores raw values while quanta::Instant stores nanoseconds. So, scaling is happened in arithmetic operations in minstant and on now() in quanta. It's hard to say what's better; it depends on the case.

Thus, a better way to measure performance is to measure now() - now().

AMD Ryzen 7 4800HS quanta v0.12.1 (scales twice)

let start = QuantaInstant::now();
QuantaInstant::now() - start

quanta/now/1            time:   [18.714 ns 19.059 ns 19.528 ns]
quanta/now/2            time:   [26.202 ns 26.982 ns 27.838 ns]
quanta/now/3            time:   [25.401 ns 26.261 ns 27.163 ns]
quanta/now/4            time:   [24.358 ns 25.239 ns 26.217 ns]
quanta/now/5            time:   [25.785 ns 26.618 ns 27.588 ns]
quanta/now/6            time:   [24.935 ns 25.700 ns 26.587 ns]

quanta v0.12.1 (scales once, but less ergonomic API):

let start = clock.raw();
clock.delta(start, clock.raw())

quanta-clock/now/1      time:   [18.406 ns 18.993 ns 19.716 ns]
quanta-clock/now/2      time:   [22.828 ns 23.718 ns 24.671 ns]
quanta-clock/now/3      time:   [23.736 ns 24.624 ns 25.561 ns]
quanta-clock/now/4      time:   [23.236 ns 24.115 ns 25.102 ns]
quanta-clock/now/5      time:   [23.419 ns 24.194 ns 25.026 ns]
quanta-clock/now/6      time:   [23.108 ns 23.896 ns 24.817 ns]

minstant 0.1.4 (scales once)

let start = MinstantInstant::now();
MinstantInstant::now() - start

minstant/now/1          time:   [11.594 ns 11.958 ns 12.402 ns]
minstant/now/2          time:   [15.615 ns 16.226 ns 16.903 ns]
minstant/now/3          time:   [14.698 ns 15.304 ns 15.969 ns]
minstant/now/4          time:   [16.244 ns 16.552 ns 16.930 ns]
minstant/now/5          time:   [14.983 ns 15.534 ns 16.148 ns]
minstant/now/6          time:   [15.610 ns 16.146 ns 16.771 ns]

So, now minstant is better in terms of performance.

If talking about functional comparison:

quanta supports more platforms
quanta provides mock API and clock overrides
minstant allows some patterns that aren't supported by quanta

For the last point, let's consider a situation where we just want to check whether some time has elapsed or not. I minstant it's possible to calculate the deadline in cycles:

let deadline = Instant::now() + Duration::from_secs(2) // stores ticks

and then checks against it multiple times:

Instant::now() >= deadline // no scaling at all

In quanta, it seems impossible because there is no time -> ticks conversion, only ticks -> time. Thus, the scaling will happen in any case. Although delta can be used to calculate scale, it's an implementation detail.

Dec 06 '23 08:12 loyd

That's a good comparison.

I'm not sure we could make a change to defer the scaling arithmetic unless we also included the scaling factor/shift in Instant itself... which might become a no-op performance change due to the increase in size for Instant. minstant has the advantage here of being able to avoid this since it's just dealing with static variables to access the scaling factor, where we're implicitly tied to a specific Clock.

Dec 06 '23 13:12 tobz

I'm not sure we could make a change to defer the scaling arithmetic

I'm too. For instance, I have both cases: one now() and multiple ops and the opposite. So, I think it should be specified in the documentation.

However, I think quanta should provide access to the current clock to simplify

let start = clock.raw();
clock.delta(start, clock.raw())

Otherwise, it requires manual OnceLock + custom Clock and implementation mocks.

Will you accept ~~quanta::default_clock() -> &'static Clock~~ or quanta::with_default_clock(|clock: &Clock| ...)?

Dec 07 '23 07:12 loyd

quanta quanta copied to clipboard

Comparing to `minstant`

quanta
quanta copied to clipboard