temporal
temporal copied to clipboard
Investigate migrating `Duration`'s internal representation from `FiniteF64` to `i64` or potentially `u32` and `u64`.
Currently, Duration uses FiniteF64 for it's field. It would be nice to move away from FiniteF64 for the internal representation and use i64 or u32 and u64.
Below are some pseudo representations (not including DateDuration or TimeDuration):
struct Duration {
years: i64,
months: i64,
weeks: i64,
days: i64,
hours: i64,
minutes: i64,
seconds: i64,
milliseconds: i64,
microseconds: i64,
nanoseconds: i64,
}
struct Duration {
sign: Sign,
years: u32,
months: u32,
weeks: u32,
days: u64,
hours: u64,
minutes: u64,
seconds: u64,
milliseconds: u64,
microseconds: u64,
nanoseconds: u64,
}
General things that need to be considered.
- How will this interface with engines where inputs may be in a range of 2^53 - 1?
- Is there any point in the operations of the Temporal specification where precision on an unbalanced duration may matter?
- Should
DateDurationandTimeDurationbe preserved in favor of using the above as the external Duration and a new "internal Duration" for calculation purposes?
Update on the progress here.
Currently, the Temporal proposal relies on behavior related to floating point arithmetic for integers above the MAX_SAFE_INTEGER for Duration validation in IsValidDuration. This may also extend to add and subtract operations. This complicates the representation that can be used for millisecond, microsecond, and nanosecond due to both microseconds and nanoseconds exceeding a range of i64 and u64.
| Unit | Largest Valid Duration |
|---|---|
| nanosecond | 9_007_199_254_740_991_463_129_087 |
| microsecond | 9_007_199_254_740_991_475_711 |
| millisecond | 9_007_199_254_740_991_487 |
Note the size of an i64::MAX and u64::MAX vs. the largest microsecond value.
9_223_372_036_854_775_807 // i64
18_446_744_073_709_551_615 // u64
9_007_199_254_740_991_475_711 // largest micros
@nekevss Is that intentional from the spec, or can we ask the champions to relax the range requirements for microseconds and nanoseconds to fit the range of an i64 ?
The issue file on the proposal is here.
https://github.com/tc39/proposal-temporal/issues/3106
We couldn't have nanosecond be in the i64 range even if we wanted to (I believe it wouldn't be able to support the JS date range). There's potential for the microseconds to be in an i64 range if the check was shifted from seconds to milliseconds. I'm not sure if that will happen.
I think best case scenario if there were updates would be:
struct Duration {
years: i32,
months: i32,
weeks: i32,
days: i64,
hours: i64,
minutes: i64,
seconds: i64,
milliseconds: i64,
microseconds: i64,
nanoseconds: i128,
}
They would have to adjust the validity checks on years, months, weeks and shift the time unit check. It would eliminate one of the max duration options and save 4 bytes as best case scenario.
Another option would be to specify the fields different depending on feature flagging. Then keep the floating point for javascript, and lessen the range for native Rust. I'm not really sure, tbh. It's a really strange type to optimize.
Any ideas?
EDIT: also, interestingly, I think the unsigned version would be more bytes not less ... although, I haven't actually checked the layout and alignment, so that may throw a hitch in which is better.
Wild idea, what if we use something like:
struct Duration {
years: i32,
months: i32,
weeks: i32,
days: i64,
hours: i64,
minutes: i64,
seconds: i64,
milliseconds: i64,
microseconds: i80,
nanoseconds: i128,
}
It should cover the larger microseconds range, and thanks to alignment it still should have the same size as the original Duration. I think the bnum crate implements a packed i80 type which uses five u16 instead of an i128.
-
That's super interesting, and I'm going to be taking a look at that crate.
-
It's an option, but we still have to use
i64currently for years, months, and weeks unless we change to the unsigned type, so the type is still larger than what it could be.
Also, just to note: we can use a i80 and i88 to represent the current fields if we are trying to be super efficient.
i88::MAX (2^87) = 154_742_504_910_672_534_362_390_528 // Nanoseconds fit! i80::MAX (2^79) = 604_462_909_807_314_587_353_088 // Microseconds fit!
When designing the proposal's current limits we did not expect that any implementation would want to store the components for the user-facing Temporal.Duration as anything but 10x f64, because those f64s are exposed directly as user-facing values of Temporal.Duration's properties. So having the Duration struct be a good Rust-programmer-facing API as well as a backing store for the JS object, is a use case we honestly did not anticipate 😄
If you use bnum, take the smaller maximum values for days, hours, and minutes into account, and store the sign as a separate bit, you could have something like the following:
struct Duration {
years: u32,
months: u32,
weeks: u32,
days: u40,
hours: u48,
minutes: u48,
seconds: u56,
milliseconds: u64,
microseconds: u80,
nanoseconds: u88,
sign: bool,
}
If I'm counting right that would give you a size of 72 bytes (assuming 8 byte padding).
Another idea that we handwavingly threw around is that Durations will be single-component in the most common case. So if you weren't attached to moving away from floats but still wanted to reduce wasted space in the struct, you could do something like this:
enum DurationComponent {
Years,
Months,
Weeks,
Days,
Hours,
Minutes,
Seconds,
Milliseconds,
Microseconds,
Nanoseconds,
}
enum Duration {
SingleComponent { value: f64, component: DurationComponent },
FullComponents { values: Box<[f64; 10]> },
}
Which would give you a struct size of 16 bytes at the cost of having to allocate an extra 80 bytes in the less common case.
If you use bnum, take the smaller maximum values for days, hours, and minutes into account, and store the sign as a separate bit, you could have something like the following:
struct Duration { years: u32, months: u32, weeks: u32, days: u40, hours: u48, minutes: u48, seconds: u56, milliseconds: u64, microseconds: u80, nanoseconds: u88, sign: bool, }
If I'm counting right that would give you a size of 72 bytes (assuming 8 byte padding).
At a glance, I really like the idea of this approach. I am a tad concerned about performance, but this would make it a quite a bit easier to reason around the ranges of the fields, which I think would certainly have it's benefits (not to mention the general size reduction).
So here's some general thoughts on the redesign here mentioned above using bnum idea.
We primarily need two public types with potentially a 3rd (need to double check the public API): Duration and DateDuration. DateDuration is important because that is tied to the calendar API.
Currently, we have Duration defined as:
struct Duration {
date: DateDuration,
time: TimeDuration,
}
The difficulty with this using the unsigned approach is we would duplicate the Sign across types, which just isn't ideal.
So we flatten Duration into it's fields and allow DateDuration to be converted to and from a Duration. This will end up with.
struct Duration {
sign: Sign,
years: u32,
months: u32,
weeks: u32,
days: u40,
hours: u48,
minutes: u48,
seconds: u56,
milliseconds: u64,
microseconds: u80,
nanoseconds: u88,
}
struct DateDuration {
sign: Sign,
years: u32,
months: u32,
weeks: u32,
days: u40, // = MAX_SAFE_INTEGER / 86_400
}
In this way, we completely remove TimeDuration from temporal_rs, the only method that will need to be preserved is from_normalized (which can be moved to NormalizedTimeDuration), but everything else can be removed.
Would we be re-ordering the fields to for alignment and padding purposes, putting the larger fields at the top
struct Duration {
nanoseconds: u88, // ~16 bytes
microseconds: u80, // ~12–16 bytes
milliseconds: u64, // 8 bytes
seconds: u56, // 8 bytes
minutes: u48, // 8 bytes
hours: u48, // 8 bytes
days: u40, // 8 bytes
years: u32, // 4 bytes
months: u32, // 4 bytes
weeks: u32, // 4 bytes
sign: Sign, // 1–4 bytes
}
Would we be re-ordering the fields to for alignment and padding purposes
We can. I don't think it's as large a concern since rustc should reorder the fields in the most efficient packing (I believe), but if we wanted to add a [repr(C)] or something of the sort, then it may be at play.
(Author of Jiff here.)
One other thing to consider here is identifying fast paths in routines that consume a Duration. In Jiff, I added a bitset to its corresponding duration type that tracks all non-zero units. I found that this made it much quicker and easier to special case common operations. Perhaps there are better techniques, but as someone who has spent a bit of time trying to optimize this big honking duration type, I wanted to share a little bit of my experience here.
The technique of boxing multi-unit durations and storing single-unit durations inline would help in some of these cases (although I've found other cases where knowing more info than just "is this a single unit duration" can be helpful). Although, it looks like y'all made Duration implement Copy (as did I in Jiff), which pretty much rules out the boxing technique.
In Jiff, I added a bitset to its corresponding duration type that tracks all non-zero units.
This could be an interesting improvement on the current approach (it's pretty minimal just due to the open issue to move away from f64). Although, it might need to be done on a second optimization pass.
If anyone following this thread can help me figure out why my logic in the PR to address this doesn't work, that would be great!! A bit stumped: https://github.com/boa-dev/temporal/pull/366
We have the following failing test262 tests:
'built-ins/Temporal/Duration/prototype/add/argument-duration-precision-exact-numerical-values': [FAIL],
'built-ins/Temporal/Duration/prototype/round/out-of-range-when-converting-from-normalized-duration': [FAIL],
'built-ins/Temporal/Duration/prototype/subtract/argument-duration-precision-exact-numerical-values': [FAIL],
'built-ins/Temporal/Duration/prototype/total/precision-exact-mathematical-values-6': [FAIL],
'built-ins/Temporal/Duration/prototype/total/precision-exact-mathematical-values-7': [FAIL],
It may be possible to fix them with just https://github.com/boa-dev/temporal/issues/334, but if not, I think we need this issue fixed.
Duration/prototype/subtract/result-out-of-range-3.js also fails
A useful testcase is that Duration::new(0,0,0,0, 0,0,0,0, 9_007_199_254_740_991_926_258, 0) is supposed to fail validity: the microsecond value gets rounded up when converting to float, and it falls out of the valid range.
With that in mind I think we are required to store f64.
I suppose for the purposes of the validity check we could "normalize through f64" (convert and back) for a similar effect.
With https://github.com/boa-dev/temporal/pull/457 we may not actually need the internal representation migrated, but it would probably be cleaner to do so.
A useful testcase is that
Duration::new(0,0,0,0, 0,0,0,0, 9_007_199_254_740_991_926_258, 0)is supposed to fail validity: the microsecond value gets rounded up when converting to float, and it falls out of the valid range.
This was covered a little bit earlier in the issue. There are 3 maximum Duration values depending on the field provided. The main plain was to validate against the intended duration depending on the unit value.
I haven't gotten to addressing that because I was planning on doing so after the update.
For further test cases, see tc39/proposal-temporal#3106.