icu4x
icu4x copied to clipboard
Support retrieval of normalize IANA names, including aliases
#4024 allows for accessing the canonical IANA identifier for a given time zone. However, it does not allow case normalization of non-canonical time zones. For example, Temporal requires that "asia/calcutta" be echoed back to the user as "Asia/Calcutta".
To implement this, add another data key that contains the normalized strings for all IANA names that are non-canonical. Sort them by their lowercase value, then look them up with a binary search.
Expose this as an API such as
pub struct AllInOneIanaBcp47Thingy {
// should contain all 3 data keys
}
impl AllInOneIanaBcp47ThingyBorrowed {
/// Returns the BCP-47 ID and the normalized IANA name of the input
pub fn lookup_iana(&self, iana: &str) -> Option<(TimeZoneBcp47Id, String)> { ... }
}
Can we use the AsciiTrie itself to store the canonical identifiers? IIUC it currently stores lowercase identifiers, and we convert requests to lowercase before doing a lookup. Instead, it could store the canonical identifiers and provide a case-insensitive lookup function. To do this, at every node it checks the input character first, and tries the other case if there's no match.
Discussion:
- @sffc I have considered that, but you'd need to backtrack if you take the wrong branch and it gets complicated
- @robertbastian I'd just disallow two different casings of the same character at the same level
- @sffc there might be collisions, let me check the data... I think it could work and shouldn't break in the future because new zones are all titlecased
- @sffc it does not solve the normalization problem, and we need to allocate a string during lookup instead of returning a slice
Conclusion: @sffc to think about this a little bit more tonight, maybe he can make this work, otherwise go ahead as originally proposed