temporal icon indicating copy to clipboard operation
temporal copied to clipboard

Investigate migrating `Duration`'s internal representation from `FiniteF64` to `i64` or potentially `u32` and `u64`.

Open nekevss opened this issue 10 months ago • 18 comments

Currently, Duration uses FiniteF64 for it's field. It would be nice to move away from FiniteF64 for the internal representation and use i64 or u32 and u64.

Below are some pseudo representations (not including DateDuration or TimeDuration):

struct Duration {
    years: i64,
    months: i64,
    weeks: i64,
    days: i64,
    hours: i64,
    minutes: i64,
    seconds: i64,
    milliseconds: i64,
    microseconds: i64,
    nanoseconds: i64,
}
struct Duration {
    sign: Sign,
    years: u32,
    months: u32,
    weeks: u32,
    days: u64,
    hours: u64,
    minutes: u64,
    seconds: u64,
    milliseconds: u64,
    microseconds: u64,
    nanoseconds: u64,
}

General things that need to be considered.

  1. How will this interface with engines where inputs may be in a range of 2^53 - 1?
  2. Is there any point in the operations of the Temporal specification where precision on an unbalanced duration may matter?
  3. Should DateDuration and TimeDuration be preserved in favor of using the above as the external Duration and a new "internal Duration" for calculation purposes?

nekevss avatar Jan 28 '25 15:01 nekevss

Update on the progress here.

Currently, the Temporal proposal relies on behavior related to floating point arithmetic for integers above the MAX_SAFE_INTEGER for Duration validation in IsValidDuration. This may also extend to add and subtract operations. This complicates the representation that can be used for millisecond, microsecond, and nanosecond due to both microseconds and nanoseconds exceeding a range of i64 and u64.

Unit Largest Valid Duration
nanosecond 9_007_199_254_740_991_463_129_087
microsecond 9_007_199_254_740_991_475_711
millisecond 9_007_199_254_740_991_487

Note the size of an i64::MAX and u64::MAX vs. the largest microsecond value.

    9_223_372_036_854_775_807 // i64
   18_446_744_073_709_551_615 // u64
9_007_199_254_740_991_475_711 // largest micros

nekevss avatar May 07 '25 02:05 nekevss

@nekevss Is that intentional from the spec, or can we ask the champions to relax the range requirements for microseconds and nanoseconds to fit the range of an i64 ?

jedel1043 avatar May 07 '25 17:05 jedel1043

The issue file on the proposal is here.

https://github.com/tc39/proposal-temporal/issues/3106

We couldn't have nanosecond be in the i64 range even if we wanted to (I believe it wouldn't be able to support the JS date range). There's potential for the microseconds to be in an i64 range if the check was shifted from seconds to milliseconds. I'm not sure if that will happen.

I think best case scenario if there were updates would be:

struct Duration {
    years: i32,
    months: i32,
    weeks: i32,
    days: i64,
    hours: i64,
    minutes: i64,
    seconds: i64,
    milliseconds: i64,
    microseconds: i64,
    nanoseconds: i128,
}

They would have to adjust the validity checks on years, months, weeks and shift the time unit check. It would eliminate one of the max duration options and save 4 bytes as best case scenario.

Another option would be to specify the fields different depending on feature flagging. Then keep the floating point for javascript, and lessen the range for native Rust. I'm not really sure, tbh. It's a really strange type to optimize.

Any ideas?

EDIT: also, interestingly, I think the unsigned version would be more bytes not less ... although, I haven't actually checked the layout and alignment, so that may throw a hitch in which is better.

nekevss avatar May 07 '25 17:05 nekevss

Wild idea, what if we use something like:

struct Duration {
    years: i32,
    months: i32,
    weeks: i32,
    days: i64,
    hours: i64,
    minutes: i64,
    seconds: i64,
    milliseconds: i64,
    microseconds: i80,
    nanoseconds: i128,
}

It should cover the larger microseconds range, and thanks to alignment it still should have the same size as the original Duration. I think the bnum crate implements a packed i80 type which uses five u16 instead of an i128.

jedel1043 avatar May 07 '25 18:05 jedel1043

  1. That's super interesting, and I'm going to be taking a look at that crate.

  2. It's an option, but we still have to use i64 currently for years, months, and weeks unless we change to the unsigned type, so the type is still larger than what it could be.

nekevss avatar May 07 '25 18:05 nekevss

Also, just to note: we can use a i80 and i88 to represent the current fields if we are trying to be super efficient.

i88::MAX (2^87) = 154_742_504_910_672_534_362_390_528 // Nanoseconds fit! i80::MAX (2^79) = 604_462_909_807_314_587_353_088 // Microseconds fit!

nekevss avatar May 07 '25 20:05 nekevss

When designing the proposal's current limits we did not expect that any implementation would want to store the components for the user-facing Temporal.Duration as anything but 10x f64, because those f64s are exposed directly as user-facing values of Temporal.Duration's properties. So having the Duration struct be a good Rust-programmer-facing API as well as a backing store for the JS object, is a use case we honestly did not anticipate 😄

If you use bnum, take the smaller maximum values for days, hours, and minutes into account, and store the sign as a separate bit, you could have something like the following:

struct Duration {
    years: u32,
    months: u32,
    weeks: u32,
    days: u40,
    hours: u48,
    minutes: u48,
    seconds: u56,
    milliseconds: u64,
    microseconds: u80,
    nanoseconds: u88,
    sign: bool,
}

If I'm counting right that would give you a size of 72 bytes (assuming 8 byte padding).

Another idea that we handwavingly threw around is that Durations will be single-component in the most common case. So if you weren't attached to moving away from floats but still wanted to reduce wasted space in the struct, you could do something like this:

enum DurationComponent {
    Years,
    Months,
    Weeks,
    Days,
    Hours,
    Minutes,
    Seconds,
    Milliseconds,
    Microseconds,
    Nanoseconds,
}

enum Duration {
    SingleComponent { value: f64, component: DurationComponent },
    FullComponents { values: Box<[f64; 10]> },
}

Which would give you a struct size of 16 bytes at the cost of having to allocate an extra 80 bytes in the less common case.

ptomato avatar May 16 '25 19:05 ptomato

If you use bnum, take the smaller maximum values for days, hours, and minutes into account, and store the sign as a separate bit, you could have something like the following:

struct Duration { years: u32, months: u32, weeks: u32, days: u40, hours: u48, minutes: u48, seconds: u56, milliseconds: u64, microseconds: u80, nanoseconds: u88, sign: bool, }

If I'm counting right that would give you a size of 72 bytes (assuming 8 byte padding).

At a glance, I really like the idea of this approach. I am a tad concerned about performance, but this would make it a quite a bit easier to reason around the ranges of the fields, which I think would certainly have it's benefits (not to mention the general size reduction).

nekevss avatar May 16 '25 22:05 nekevss

So here's some general thoughts on the redesign here mentioned above using bnum idea.

We primarily need two public types with potentially a 3rd (need to double check the public API): Duration and DateDuration. DateDuration is important because that is tied to the calendar API.

Currently, we have Duration defined as:

struct Duration {
    date: DateDuration,
    time: TimeDuration,
}

The difficulty with this using the unsigned approach is we would duplicate the Sign across types, which just isn't ideal.

So we flatten Duration into it's fields and allow DateDuration to be converted to and from a Duration. This will end up with.

struct Duration {
    sign: Sign,
    years: u32,
    months: u32,
    weeks: u32,
    days: u40,
    hours: u48,
    minutes: u48,
    seconds: u56,
    milliseconds: u64,
    microseconds: u80,
    nanoseconds: u88,
}

struct DateDuration {
    sign: Sign,
    years: u32,
    months: u32,
    weeks: u32,
    days: u40, // = MAX_SAFE_INTEGER / 86_400
}

In this way, we completely remove TimeDuration from temporal_rs, the only method that will need to be preserved is from_normalized (which can be moved to NormalizedTimeDuration), but everything else can be removed.

nekevss avatar May 17 '25 13:05 nekevss

Would we be re-ordering the fields to for alignment and padding purposes, putting the larger fields at the top

struct Duration {
    nanoseconds: u88,      // ~16 bytes
    microseconds: u80,     // ~12–16 bytes
    milliseconds: u64,     // 8 bytes
    seconds: u56,          // 8 bytes
    minutes: u48,          // 8 bytes
    hours: u48,            // 8 bytes
    days: u40,             // 8 bytes
    years: u32,            // 4 bytes
    months: u32,           // 4 bytes
    weeks: u32,            // 4 bytes
    sign: Sign,            // 1–4 bytes
}

jasonwilliams avatar May 18 '25 19:05 jasonwilliams

Would we be re-ordering the fields to for alignment and padding purposes

We can. I don't think it's as large a concern since rustc should reorder the fields in the most efficient packing (I believe), but if we wanted to add a [repr(C)] or something of the sort, then it may be at play.

nekevss avatar May 19 '25 16:05 nekevss

(Author of Jiff here.)

One other thing to consider here is identifying fast paths in routines that consume a Duration. In Jiff, I added a bitset to its corresponding duration type that tracks all non-zero units. I found that this made it much quicker and easier to special case common operations. Perhaps there are better techniques, but as someone who has spent a bit of time trying to optimize this big honking duration type, I wanted to share a little bit of my experience here.

The technique of boxing multi-unit durations and storing single-unit durations inline would help in some of these cases (although I've found other cases where knowing more info than just "is this a single unit duration" can be helpful). Although, it looks like y'all made Duration implement Copy (as did I in Jiff), which pretty much rules out the boxing technique.

BurntSushi avatar May 21 '25 14:05 BurntSushi

In Jiff, I added a bitset to its corresponding duration type that tracks all non-zero units.

This could be an interesting improvement on the current approach (it's pretty minimal just due to the open issue to move away from f64). Although, it might need to be done on a second optimization pass.

nekevss avatar May 21 '25 14:05 nekevss

If anyone following this thread can help me figure out why my logic in the PR to address this doesn't work, that would be great!! A bit stumped: https://github.com/boa-dev/temporal/pull/366

robot-head avatar Jul 04 '25 23:07 robot-head

We have the following failing test262 tests:

  'built-ins/Temporal/Duration/prototype/add/argument-duration-precision-exact-numerical-values': [FAIL],
  'built-ins/Temporal/Duration/prototype/round/out-of-range-when-converting-from-normalized-duration': [FAIL],
  'built-ins/Temporal/Duration/prototype/subtract/argument-duration-precision-exact-numerical-values': [FAIL],
  'built-ins/Temporal/Duration/prototype/total/precision-exact-mathematical-values-6': [FAIL],
  'built-ins/Temporal/Duration/prototype/total/precision-exact-mathematical-values-7': [FAIL],

It may be possible to fix them with just https://github.com/boa-dev/temporal/issues/334, but if not, I think we need this issue fixed.

Manishearth avatar Jul 23 '25 06:07 Manishearth

Duration/prototype/subtract/result-out-of-range-3.js also fails

A useful testcase is that Duration::new(0,0,0,0, 0,0,0,0, 9_007_199_254_740_991_926_258, 0) is supposed to fail validity: the microsecond value gets rounded up when converting to float, and it falls out of the valid range.

With that in mind I think we are required to store f64.

I suppose for the purposes of the validity check we could "normalize through f64" (convert and back) for a similar effect.

Manishearth avatar Jul 26 '25 04:07 Manishearth

With https://github.com/boa-dev/temporal/pull/457 we may not actually need the internal representation migrated, but it would probably be cleaner to do so.

Manishearth avatar Jul 26 '25 04:07 Manishearth

A useful testcase is that Duration::new(0,0,0,0, 0,0,0,0, 9_007_199_254_740_991_926_258, 0) is supposed to fail validity: the microsecond value gets rounded up when converting to float, and it falls out of the valid range.

This was covered a little bit earlier in the issue. There are 3 maximum Duration values depending on the field provided. The main plain was to validate against the intended duration depending on the unit value.

I haven't gotten to addressing that because I was planning on doing so after the update.

For further test cases, see tc39/proposal-temporal#3106.

nekevss avatar Jul 26 '25 05:07 nekevss