icu4x
icu4x copied to clipboard
TZDB Datagen
TZDB Datagen
Adds data generation for historic time zone transitions as well as time zone transition rules.
Historic Transitions
A multi-dimensional mapping from BCP47 TZID to daylight/GMT info to a list of historic timestamps for which a daylight savings time transition occurred.
e.g. (not representative of the data's actual format in ICU4X)
uslax => { GMT-8, DST(false) } => [ timestamp_1, timestamp_2, ... timestamp_n ]
uslax => { GMT-7, DST(true) } => [ timestamp_1, timestamp_2, ... timestamp_n ]
Given a ZonedDateTime, the historic time zone transitions are effectively a lookup-table that can be used to determine the GMT offset and whether or not the variant is standard or daylight time at a given point in history.
Transition Rules
A mapping from BCP47 TZID to information about daylight savings transition offsets and when they occur.
e.g. (not representative of the data's actual format in ICU4X)
uslax => {
STD(GMT-8),
DST(GMT-7),
DSTStart {
Month(3),
Week(2),
Day(0),
Time(2:00 AM),
},
DSTEnd {
Month(11),
Week(1),
Day(0),
Time(2:00 AM),
}
}
The transition rules provides the current sets of information regarding GMT offsets as well as the the day-of-years and time-of-days when daylight savings time transitions occur in a given time zone. This can be used to determine the GMT offset and daylight variant for ZonedDateTimes in the future, as a backup in case there is no historic data available, or as an extremely lightweight dataset if an application will only be formatting current-time dates, and not dates that span into the past or future.
Notice: the branch changed across the force-push!
- components/datetime/src/provider/tzdb/serde.rs is different
~ Your Friendly Jira-GitHub PR Checker Bot
CI is failing because --all-keys
now requires the tzdb, which technically is a breaking change. We might have to bump datagen to 2.0, although then we lose the version match for baked data.
Hmm, interesting question about what to do when --all-keys
requires an all-new data source.
Let's default --tzdb-root
to /usr/share/zoneinfo
and print a warning when it's used (i.e. on SourceData::tzdb
. This way datagen stays usable without compiling your own time zones.
This works for Mac and Linux, but probably not for Windows. Still, better than nothing.
Notice: the branch changed across the force-push!
- Cargo.lock is different
- components/timezone/Cargo.toml is different
- components/timezone/src/provider/mod.rs is different
- provider/datagen/Cargo.toml is different
- provider/datagen/src/bin/datagen.rs is different
- provider/datagen/src/lib.rs is different
- provider/datagen/src/registry.rs is different
- provider/datagen/src/source.rs is different
- provider/datagen/src/transform/cldr/source.rs is different
- provider/datagen/src/transform/icuexport/collator/mod.rs is different
- provider/datagen/tests/verify-zero-copy.rs is different
- provider/testdata/data/baked/mod.rs is different
- provider/testdata/data/json/fingerprints.csv is different
- provider/testdata/data/postcard/fingerprints.csv is different
- provider/testdata/data/testdata.postcard is no longer changed in the branch
- provider/testdata/data/tzif/Etc/GMT-4 is now changed in the branch
~ Your Friendly Jira-GitHub PR Checker Bot
Notice: the branch changed across the force-push!
- Cargo.lock is different
- provider/testdata/data/json/fingerprints.csv is different
- provider/testdata/data/postcard/fingerprints.csv is different
- provider/testdata/data/testdata.postcard is different
~ Your Friendly Jira-GitHub PR Checker Bot
@robertbastian
I've gone through and responded to all of your feedback.
Functionality has been made experimental
, and everything now uses AbstractFs
, and I've tested this both with zipped data and uncompressed data.
I've also defaulted the path to /usr/share/zoneinfo
which is working fine, both zipped and uncompressed.
I had to add some logic that first checks whether or not a file is intended to be TZif
by checking the header (the first 4 bytes), and ignoring it otherwise, since usr/share/zoneinfo
contains other types of files in addition to the TZif
files.
Perhaps it would be best to only have a warning instead of a hard error if the path is unspecified. People may want to use DateTime
without time zones, so they shouldn't be required to load time zones data.
Tests are now passing on CI for ubuntu and macos, with the default directory leading to /usr/share/zoneinfo
, but Windows is still failing because it doesn't ship with that data.
@sffc @robertbastian
How hard would it be to point our Windows CI job at the testdata directory for only this path when testing datagen
?
Merged for ya