zig icon indicating copy to clipboard operation
zig copied to clipboard

Proposal: DateTime in std.time

Open tau-dev opened this issue 4 years ago • 33 comments
trafficstars

It could be very useful to have something like:

// Strictly Gregorian
const DateTime = struct {
    sec: u6, // [0, 60]
    min: u6, // [0, 59]
    hour: u5, // [0, 23] 
    year_day: u9, // [1, 366]
    year: i16, // C.E.

    pub fn monthDay(date: Self) u5; // [1, 31]
    pub fn month(date: Self) u4; // [1, 12]
    pub fn weekDay(date: Self) u8; // [0, 6]
    pub fn week(date: Self) u6; // [1, 54]
    pub fn isLeapYear(date: Self) bool;

    /// utc_offset is in minutes.
    pub fn fromEpoch(epoch: i64, utc_offset: i16) Self;
    pub fn toEpoch(date: Self, utc_offset: i16) i64;
};

// Posix Epoch repeats on leap seconds, International Atomic Time is useful for strictly monotonic time stamps.
// As far as I understood https://techcommunity.microsoft.com/t5/networking-blog/leap-seconds-for-the-appdev-what-you-should-know/ba-p/339813, 
// it seems like Windows' GetSystemTimeAsFileTime follows IAC.
pub fn epochToAtomicTime(epoch: i64, leap_seconds: []const i64) i64;
pub fn atomicTimeToEpoch(atomic: i64, leap_seconds: []const i64) i64;


Maybe also with a more elegant way to format it than passing the fields seperately to fmt.format.

tau-dev avatar Mar 30 '21 20:03 tau-dev

I think having a proper well-made date/time library is important, but it's not a good idea to include this in std. A recommended date/time library would be better.

But yes, i agree that having a date-time api would be a good idea!

ikskuh avatar Mar 31 '21 10:03 ikskuh

This proposal is fairly similar to C's tm from time.h. Are there any guidelines as to what should be in zig's std library?

tau-dev avatar Mar 31 '21 11:03 tau-dev

    sec: u8, // [0, 60]
    min: u8, // [0, 59]
    hour: u8, // [0, 23] 
    year_day: u16, // [0, 365]

should be

    sec: u6, // [0, 60]
    min: u6, // [0, 59]
    hour: u5, // [0, 23]
    day: u5, // [1, 31]
    month: u4, // [1, 12]
    year_day: u9, // [1, 366]

Mouvedia avatar Mar 31 '21 13:03 Mouvedia

@Mouvedia

sec: u6, // [1, 59] is horribly wrong. This ignores leap seconds, also you cannot have 8:00 in the morning with this ;)

I plead everyone here to read Falsehoods programmers believe about time as it lists a whole bunch of misconceptions about date/time values

ikskuh avatar Mar 31 '21 13:03 ikskuh

Seconds are far, far too little resolution when atomic clocks are used as a reference (e.g. in GPS).

In glibc struct timespec allows to specify nanosecond, and was the result of a long painful learning experience. Why not just add convenience accessor functions to specify days and months, hours, minutes and seconds, to a structure that is binary compatible with struct timespec?

RogierBrussee avatar Mar 31 '21 14:03 RogierBrussee

@RogierBrussee DateTime would be used for human-readable displaying, is there any use-case for nanoseconds there? How would you modify the proposal?

Why not just add convenience accessor functions to specify days and months, hours, minutes and seconds [...]?

That is exactly what this is.

... binary compatible with struct timespec

timespec is not cross-platform, the standard library functions return i64 second or i128nanosecond timestamps.

@Mouvedia You're probably right. Modified the proposal.

tau-dev avatar Mar 31 '21 22:03 tau-dev

I think having a proper well-made date/time library is important, but it's not a good idea to include this in std. A recommended date/time library would be better.

There's a risk of becoming like python's datetime with this approach. Theres a datetime lib in std which does most of what you want, but doesn't actually work, and a 3rd party recommended lib which actually works, but is not fully compatible with the standard one. Whatever happens, half of a datetime lib in std is a bit painful :(

laserbeam3 avatar Apr 01 '21 05:04 laserbeam3

@InterplanetaryEngineer If the clock in your computer runs at 3.3 GHZ then it has 1/3ns resolution, but admittedly that is not absolute time. The javascript DOMHighresolutionTimestamp() is in microseconds. Miliseconds, that's what the Olympic 100m is decided on. Mere seconds are just too low a resolution for many events, so it is better to have plenty of room below.

Your datamodel makes it impossible to have higher than 1 second resolution however. For the interface, it should not matter what the datamodel is as long as it allows sufficient resolution: struct timespec is effectively a i128 worth of nanoseconds. I don't really see why it is not cross platform (glibc is). Windows just does not use it in its native interfaces.

RogierBrussee avatar Apr 01 '21 10:04 RogierBrussee

The javascript DOMHighresolutionTimestamp() is in microseconds

It depends. Safari and Firefox have set their resolution to 1ms, others vary between 5µs and 100µs.

Mouvedia avatar Apr 01 '21 13:04 Mouvedia

@RogierBrussee see here. POSIX does not specify, if time_t is signed or unsigned. Some history.

Being signed would have the advantage to represent events before midnight UTC of January 1, 1970. DateTime should offer some common operations for DateTime comparison, so the potential negative signedness would not be of significance as potential error source.

Otherwise an unsigned representation would be better to prevent errors.

matu3ba avatar Apr 09 '21 23:04 matu3ba

Recently I've been working with @FObersteiner to improve my zig-tzif library. We stole some tests from Python's zoneinfo module. After getting those tests to pass, I decided that working with time zones would be much better in a library that included other DateTime functionality. So I revived my chrono-zig library and copied over the code from zig-tzif.

Long story short, I've been working on DateTime handling code and wouldn't mind contributing it to the standard library.

There are some things I want to add to it at the moment:

  • Verify that getting the local TimeZone works on MacOS. I think it uses /etc/localtime and IANA tzif database like Linux, but I'm not sure.
  • Update the chrono-zig formatting module to support the latest changes
  • Rename current types to match Working with Time and Timezones serializations and add any missing types
  • Look at multiple paths for the zoneinfo database like PEP615
  • Create a mechanism to ship the zoneinfo database with the binary, like PEP615
  • Map Windows time zone keys to IANA database identifiers. As far as I know, Windows doesn't use the IANA database. Windows has their own database of time zones with their own identifier. But if we wanted to support something like WIP "Serialising Extended Data About Times and Events" in a consistent way, we would need to use the Windows API to get the local time zone and then map it to an IANA time zone some how.
  • Learn about leap seconds and how to account for them in the API

Of course, I'm don't know if any or all of this should be in the standard library. C, C++, Python, Go have DateTime implementations in the standard library, however Rust leaves it to 3rd party crates like chrono. I think working with dates and times is common enough to warrant being in the standard library, and complex enough that rolling your own is likely to result in bugs. However, the heuristic of "is this necessary to implement a compiler," suggests that it shouldn't be in the standard library.

Either way I'll probably keep working on chrono-zig][] for the time being.

leroycep avatar Dec 10 '23 01:12 leroycep

As a small addition to @leroycep 's post above, Zig already has a TZif parser in the standard library, although this one is incomplete in my eyes since it doesn't handle POSIX TZ rules (what zig-tzif does). That seems inconsistent to me. A datetime library becomes very powerful if it has time zone support. But it feels strange to me to have time zone support but no datetime ;-) I guess the creator of this PR @Aransentin originally aimed at implementing full datetime support in Zig's std, but it was never finished.

FObersteiner avatar Dec 10 '23 17:12 FObersteiner

I guess the creator of https://github.com/ziglang/zig/pull/10456 @Aransentin originally aimed at implementing full datetime support in Zig's std, but it was never finished.

Yep, I just never finished it.

Tangentially datetime support is really nice to have in the stdlib for one simple reason, and that is that a whole bunch of cruddy protocols use dates as strings (HTTP, x509...). These are protocols which we do want in the standard library, which means any dependency needs to be included as well.

Aransentin avatar Dec 11 '23 18:12 Aransentin

@Aransentin I see, good point! and nice you came back here. Looking at PRs #3832, #9929 and #14537, it seems there have been multiple related attempts; especially #9929 has an interesting discussion that illustrates how many rabbit holes you can go down.

FObersteiner avatar Dec 11 '23 18:12 FObersteiner

sorry for the drive by comment as i haven't thoroughly read this issue, but i just wanted to mention this project https://github.com/cassioneri/eaf which seems like a very elegant and efficient approach to calendar math. i originally learned about it a few months ago watching 'Implementing Fast Calendar Algorithms - Speeding Date - Cassio Neri - CppNow 2023': https://www.youtube.com/watch?v=0s9F4QWAl-E

maybe someone more familiar with zig's DateTime needs can comment on whether this might be a valuable approach to study or follow.

travisstaloch avatar Dec 14 '23 12:12 travisstaloch

@travisstaloch thanks for sharing, I'd probably never found that repo, with this title ^^ It would be very interesting to see how the performance benchmark translates for a zig implementation. I didn't dig deeper but I think libc++ is Howard Hinnants algorithm, which to me seems a bit simpler than the fastest competitor. But I might be totally wrong on this one from a compiler's perspective ;-)

FObersteiner avatar Dec 14 '23 14:12 FObersteiner

@FObersteiner i started working on a zig implementation here https://github.com/travisstaloch/date-zig. i haven't done any benchmarking yet, just working on correctness so far.

i did some work toward integrating w/ std.time.Instant and currently that seems to work pretty well on posix. but anything non-posix hasn't been implemented yet.

EDIT: correction - I actually copied some of std.time.Instant into the lib to make it an extern struct so that i could export a c API.

travisstaloch avatar Dec 17 '23 06:12 travisstaloch

@FObersteiner got a notification with some benchmarking results and a link here. but it seems to be gone. just wanted to let you know in case there was some kind of a github glitch.

travisstaloch avatar Dec 18 '23 02:12 travisstaloch

Hehe interesting... yes @travisstaloch I was playing around with the Neri-Schneider and Hinnant algorithms yesterday, prepared a reply, but then wasn't sure if the results were meaningful at all and deleted it. I just checked again and it seems to be ok... you can see for yourself:

  • source - note that I made some adjustments to the algorithms (no bounds checks etc.) so that they fit in with what I already had. No changes made to the "core" though.
  • benchmark - I'm using zbench here

as result, without optimizations, I get [Hinnant / Neri-Schneider] on my i5-1240P notebook running under Linux

  • days --> date: 1.2
  • date --> days: 3.1

This is mostly consistent with what I got on another, older machine. If I compile with the ReleaseFast or ReleaseSafe option however, results become highly implausible. The compiler might have figured out that my benchmark "functions" actually don't do anything. Might also be a quirk how the zBench code gets optimized, not sure.

Long story short, at the moment I don't see a reason to prefer another algorithm over the Neri-Schneider. So why not go with those in the Zig standard library?

FObersteiner avatar Dec 18 '23 07:12 FObersteiner

If I compile with the ReleaseFast or ReleaseSafe option however, results become highly implausible.

thanks for the report and making the benchmarks! i ran this benchmark on my machine - an AMD 5700x - and saw similar results. and w/ ReleaseFast, all 4 entries ran in exactly 24ns indicating that they had been optimized away. so i added this diff:

diff --git a/src/benchmark.zig b/src/benchmark.zig
index dea9565..db23b79 100644
--- a/src/benchmark.zig
+++ b/src/benchmark.zig
@@ -9,6 +9,7 @@ fn bench_dateFromUnix_Hinnant(b: *zbench.Benchmark) void {
     var j: i32 = 1;
     while (j < 10_000) : (j += 1) {
         tmp = cal.dateFromUnixdays(j);
+        std.mem.doNotOptimizeAway(tmp);
     }
 }
 
@@ -19,6 +20,7 @@ fn bench_unixFromDate_Hinnant(b: *zbench.Benchmark) void {
     var j: u16 = 1;
     while (j < 10_000) : (j += 1) {
         tmp = cal.unixdaysFromDate([3]u16{ j, 1, 1 });
+        std.mem.doNotOptimizeAway(tmp);
     }
 }
 
@@ -29,6 +31,7 @@ fn bench_dateFromUnix_NeriSchneider(b: *zbench.Benchmark) void {
     var j: i32 = 1;
     while (j < 10_000) : (j += 1) {
         tmp = cal.rdToDate(j);
+        std.mem.doNotOptimizeAway(tmp);
     }
 }
 
@@ -39,6 +42,7 @@ fn bench_unixFromDate_NeriSchneider(b: *zbench.Benchmark) void {
     var j: u16 = 1;
     while (j < 10_000) : (j += 1) {
         tmp = cal.dateToRD([3]u16{ j, 1, 1 });
+        std.mem.doNotOptimizeAway(tmp);
     }
 }

This resulted in these at least differing (and more plausible?) results:

/tmp/zdt $ zig build benchmark -Doptimize=ReleaseFast && zig-out/bin/benchmark 
Test [1/4] test.bench Neri-Schneider, days -> date... Total operations: 43844
benchmark            time (avg)   (min ... max)        p75        p99        p995      
--------------------------------------------------------------------------------------
Neri-Schneider, rd to date 22.812µs     (22.769µs ... 50.460µs) 22.780µs   23.769µs   24.840µs  
Test [2/4] test.bench Neri-Schneider, date -> days... Total operations: 59468
benchmark            time (avg)   (min ... max)        p75        p99        p995      
--------------------------------------------------------------------------------------
Neri-Schneider, date to rd 5.615µs      (5.599µs ... 25.720µs) 5.610µs    5.610µs    5.610µs   
Test [3/4] test.bench Hinnant, days -> date... Total operations: 20130
benchmark            time (avg)   (min ... max)        p75        p99        p995      
--------------------------------------------------------------------------------------
Hinnant, days to civil 49.654µs     (49.459µs ... 69.869µs) 49.480µs   54.890µs   56.700µs  
Test [4/4] test.bench Hinnant, date -> days... Total operations: 36662
benchmark            time (avg)   (min ... max)        p75        p99        p995      
--------------------------------------------------------------------------------------
Hinnant, civil to days 13.667µs     (13.629µs ... 31.109µs) 13.640µs   14.480µs   14.869µs  
All 4 tests passed.
/tmp/zdt $ 

I'm not sure this is the correct way to use mem.doNotOptimizeAway() here, but it has definitely changed the outcome. Would you say these results seem reasonable?

travisstaloch avatar Dec 18 '23 12:12 travisstaloch

I'm not sure this is the correct way to use mem.doNotOptimizeAway() here, but it has definitely changed the outcome. Would you say these results seem reasonable?

ah that looks better. Neri-Schneider being x2-3 faster agrees pretty well with what Cassio Neri shows in his presentation.

FObersteiner avatar Dec 18 '23 13:12 FObersteiner

Zig's current date time needs are for:

  1. Emitting C date and time macros in aro/Compilation.zig
  2. Certificate DER parsing in Certificate.zig
  3. Extern UEFI structs in os/uefi.zig

They all need fromEpoch and toEpoch, so I'll make a PR to unify 1 and 2 into one DateTime type that uses the aforementioned Euclidean Affine Transforms and have 3 with its differing type call into it. Here's why.

I've read a few libraries in various languages and they tend to have 2-3 layers:

  1. Counting functions (days in year, days in month, etc.). They all use various tricks, of which Euclidean Affine Transforms are the newest and fastest.
  2. Calendar, Time, and Duration types. Calendar and Time types have various sizes and precisions. Calendar types are sometimes namespaced by system.
  3. Leap seconds, timezones, daylight savings, and localization.

Edit: I think what belongs in the std is what I've PRed in #19549 .

Timezone support may be possible to add since Linux, Mac, and Windows all have standard file locations for timezone databases and a similar search is done for certificate bundles for TLS. If full portability is desired, tzdata2024a.tar.gz is 440.7kb, although it does need regular updating. I don't know exactly what's required for leap second support yet.

After getting lost in how Chromium handles localization, I think parsing/formatting non-ISO 8601 and RFC 3339 strings is outside the scope of the standard library.

clickingbuttons avatar Apr 02 '24 05:04 clickingbuttons

so I'll make a PR to unify 1 and 2 into one DateTime type that uses the

excited to see your PR :)

In general, I still think having basic date/time functionality in the std lib would be cool. If this provides an interface for time zones to hook in - great. The tzdata in the std lib? Not so sure. Keep in mind, time zone rules are subject to political decisions, which can come without much notice ahead of time. So any time some country decides to change their tz rules, we would have to update the Zig std lib - which sounds... strange to me. We had this discussion over here.

FObersteiner avatar Apr 05 '24 09:04 FObersteiner

I didn't read that PR before making my own. It seems to have taken a substantially different route than I did, so I'm happy there was little (if any) overlap with @Vexu 's work.

After reading the discussion you linked I agree shipping TZ data feels strange. I am not in favor of it.

clickingbuttons avatar Apr 05 '24 17:04 clickingbuttons

I think the standard library should support bundling tzdata into a binary, but it probably shouldn't include the data itself. Some way to hook into the TZ loading code should be enough.

leroycep avatar Apr 05 '24 18:04 leroycep

On Unix-y machines, there seems to be no need to include the time zone information, it's typically located at /usr/share/zoneinfo. Windows stores them in the Registry (ew!). For other use cases, which we should definitely support, we may want to include an option for reading tzfile(5) files. The contents of my /usr/share/zoneinfo, tar'd and compressed with zstd -19, are only 119K, making it viable to compile them into the binary. In any case, not using the OS-provided time zone database is unacceptable, exposing the implementation details of the time zone database to the user is terrible API design.

An API for reading the time zone database like this might be desirable:

   pub fn TzDb(comptime TzDbContext: type) type {...}

   pub const UnixLikeContext = struct {
        zoneinfo: std.fs.Dir,
      
        pub fn init(zoneinfo_dir: ?std.fs.Dir) !UnixLikeContext {
            return .{
                .zoneinfo = zoneinfo_dir orelse try findZoneinfo(),
            };
        }
      
        pub fn findZoneinfo() !std.fs.Dir {
            const dirs = .{"/usr/share/zoneinfo", "/usr/local/share/zoneinfo"};
            for (dirs) |dir| {
                const abs = std.fs.openFileAbsolute(dir) catch |err| switch(err) {
                    error.FileNotFound => continue,
                    else => |e| return e,
                };
            }
            return error.NoZoneinfoDir;
        }
      
        pub fn readZoneinfo(context: UnixLikeContext, timezone: []const u8, alist: *std.ArrayList(u8)) !void {
            const f = try context.openFile(timezone, .{});
            defer f.close();
            try f.readAllArrayList(alist, std.math.maxInt(usize));
        }
   };

This should enable reading directly from a compiled time zone database. It might create a reliance on the tzfile format, which may or may not be a problem when dealing with Windows.


holy shit, I hate time, I hope I never have to write time-related code in my life

notcancername avatar Apr 06 '24 16:04 notcancername

On Unix-y machines, there seems to be no need to include the time zone information

At least in a WebAssembly environment, there's no way to access that database.

jedisct1 avatar Apr 06 '24 19:04 jedisct1

@jedisct1, if WebAssembly has access to browser APIs, then it can read the time zone database. WASI is able to read the time zone database on Unix-y systems if it is given read access to the directory where zoneinfo is located.

notcancername avatar Apr 06 '24 19:04 notcancername

Re: Windows:

https://learn.microsoft.com/en-us/windows/win32/api/timezoneapi/ns-timezoneapi-time_zone_information

yikes.

A table of leap seconds does not seem to be stored on Windows, it seems like a Windows implementation of the proposed context would have to do something like:

const leap_seconds =
mucho texto
\\Leap	1972	Jun	30	23:59:60	+	S
\\Leap	1972	Dec	31	23:59:60	+	S
\\Leap	1973	Dec	31	23:59:60	+	S
\\Leap	1974	Dec	31	23:59:60	+	S
\\Leap	1975	Dec	31	23:59:60	+	S
\\Leap	1976	Dec	31	23:59:60	+	S
\\Leap	1977	Dec	31	23:59:60	+	S
\\Leap	1978	Dec	31	23:59:60	+	S
\\Leap	1979	Dec	31	23:59:60	+	S
\\Leap	1981	Jun	30	23:59:60	+	S
\\Leap	1982	Jun	30	23:59:60	+	S
\\Leap	1983	Jun	30	23:59:60	+	S
\\Leap	1985	Jun	30	23:59:60	+	S
\\Leap	1987	Dec	31	23:59:60	+	S
\\Leap	1989	Dec	31	23:59:60	+	S
\\Leap	1990	Dec	31	23:59:60	+	S
\\Leap	1992	Jun	30	23:59:60	+	S
\\Leap	1993	Jun	30	23:59:60	+	S
\\Leap	1994	Jun	30	23:59:60	+	S
\\Leap	1995	Dec	31	23:59:60	+	S
\\Leap	1997	Jun	30	23:59:60	+	S
\\Leap	1998	Dec	31	23:59:60	+	S
\\Leap	2005	Dec	31	23:59:60	+	S
\\Leap	2008	Dec	31	23:59:60	+	S
\\Leap	2012	Jun	30	23:59:60	+	S
\\Leap	2015	Jun	30	23:59:60	+	S
\\Leap	2016	Dec	31	23:59:60	+	S
;

if (std.mem.eql(name, "leapseconds")) {
    try alist.appendSlice(leap_seconds)
    return;
}

// ... read registry

notcancername avatar Apr 06 '24 20:04 notcancername

wasnt it established previously that something as big and use case-dependent as this should be prototyped in 3rd party packages first, especially now that the package manager exists?

nektro avatar Apr 07 '24 00:04 nektro