icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Support retrieval of normalize IANA names, including aliases

Open sffc opened this issue 2 years ago • 2 comments

#4024 allows for accessing the canonical IANA identifier for a given time zone. However, it does not allow case normalization of non-canonical time zones. For example, Temporal requires that "asia/calcutta" be echoed back to the user as "Asia/Calcutta".

To implement this, add another data key that contains the normalized strings for all IANA names that are non-canonical. Sort them by their lowercase value, then look them up with a binary search.

Expose this as an API such as

pub struct AllInOneIanaBcp47Thingy {
    // should contain all 3 data keys
}

impl AllInOneIanaBcp47ThingyBorrowed {
    /// Returns the BCP-47 ID and the normalized IANA name of the input
    pub fn lookup_iana(&self, iana: &str) -> Option<(TimeZoneBcp47Id, String)> { ... }
}

sffc avatar Sep 13 '23 07:09 sffc

Can we use the AsciiTrie itself to store the canonical identifiers? IIUC it currently stores lowercase identifiers, and we convert requests to lowercase before doing a lookup. Instead, it could store the canonical identifiers and provide a case-insensitive lookup function. To do this, at every node it checks the input character first, and tries the other case if there's no match.

robertbastian avatar Sep 14 '23 09:09 robertbastian

Discussion:

  • @sffc I have considered that, but you'd need to backtrack if you take the wrong branch and it gets complicated
  • @robertbastian I'd just disallow two different casings of the same character at the same level
  • @sffc there might be collisions, let me check the data... I think it could work and shouldn't break in the future because new zones are all titlecased
  • @sffc it does not solve the normalization problem, and we need to allocate a string during lookup instead of returning a slice

Conclusion: @sffc to think about this a little bit more tonight, maybe he can make this work, otherwise go ahead as originally proposed

robertbastian avatar Sep 14 '23 17:09 robertbastian