Finalize the DisplayNames component
- [ ] Improve the data
- [x] #6920
- [ ] #3260
- [ ] Support data provider that can load multiple attributes in a single load
- [ ] Improve the algorithm
- [ ] Load display names patterns from data, without hard-coding
(,), etc. Work with CLDR to verify the correctness of the algorithm and possibly submit improvements for CLDR 49. - [ ] Consider whether
find_longest_matching_subtagneeds to return language-script-region or something with variants - [ ] Make sure "es-Latn-419" returns "Latin American Spanish (Latin)"
- [ ] Load display names patterns from data, without hard-coding
- [ ] Add features to the API
- [ ] Support dialect option that can be turned on or off
- [ ] Large graduation checklist items
- [ ] FFI
- [ ] Docs
- [ ] Full graduation checklist (post after all above tasks are done)
Notes from a discussion with @sffc and @Manishearth:
- The attributes are highly unstable. Regions change, languages change. So we shouldn't make separate markers for each one. We should stick with the attributes.
- The initial minimal DisplayNames API is a free function or thin formatter struct that loads a single display name.
- We want to scale to support efficiently loading multiple display names for a single locale. There are multiple ways to implement this. The one that @sffc and @Manishearth think is most promising is a new trait living alongside DataProvider that takes a list of attributes and returns a structure that efficiently returns them. Caveats:
- It should be zero-copy on Bake and Postcard providers. Bake can generate extra impls. Postcard should be fine as it does something like this for IterableDataProvider.
- For mutable providers, like a network provider, we can only return a "snapshot", since additional data can be loaded into the provider as the program runs. So, the impl for that type of provider probably needs to allocate a Vec.
- @Manishearth might post more thoughts below.
- There are open questions on the display names algorithm coming from CLDR, such as what to do with nested parentheses. This impacts the data model. We should therefore seek answers to those questions early in the process. Answering that question can be done in parallel with the ZeroTrie optimization for smaller storage of data marker attributes.
a new trait living alongside DataProvider that takes a list of attributes and returns a structure that efficiently returns them
I wasn't necessarily thinking about this in terms of a new trait (though a trait might be involved). I was thinking that we have e.g. a DisplayNames marker that has attributes, and then a DisplayNamesAll marker that has no attributes and returns e.g. a ZeroTrie<DisplayNamesData> or something. Our data marker metadata links the two:
- Baked data generates an additional DisplayNamesAll impl that returns the whole trie
- Postcard uses the metadata to slice the zerotrie (or something similar).
- other backends are free to do other things, and they can also choose to not support this marker.
This is a very rough design but I think it could work. However, if a trait based solution sounds good I'm open to that too, I just haven't thought through the implications.
@Manishearth does your DisplayNamesAll involve returning an enumeration over different data provider sources, since they might not all be backed by a ZeroTrie, or were you thinking that non-ZT backends are just unsupported? (Or perhaps they need to build the ZT on the spot?)
Good question, I was thinking the latter (no support), but I can see the other model working as well.
The problem with "no support" is that things get complicated for buffer.