icu4x
icu4x copied to clipboard
Make docs for the 3 data provider constructors super clear
From @zbraniecki and @robertbastian
The three types of provider constructors are documented here:
https://unicode-org.github.io/icu4x-docs/doc/icu_provider/index.html#icu4x-constructors-and-data-versioning-policy
I have a macro that generates the _with_any_provider and _with_buffer_provider constructors. That macro should include more details in the doc strings of those generated functions and point to the full documentation page linked above.
We may also want to include boilerplate on the _unsafe constructor, but this can't be auto-generated as easily.
@zbraniecki and/or @robertbastian, would you like to suggest some prose for this?
Roughly:
_unstable: This function is unstable in the way that its data provider bounds are not guaranteed to be stable across even minor releases. If you don't need this stability (i.e. because you're using BakedDataProvider), prefer this as it has the best performance. Otherwise see buffer/any.
_buffer: This version of this function accepts BufferProvider, which comes with additional deserialization overhead. However, it is guaranteed to work with any data provider generated for this major release.
_any: This version of this function accepts AnyProvider, which comes with additional dynamic dispatch overhead. However, it is guaranteed to work with any data provider generated for this major release.
I still think we should not automatically implement ResourceProvider for buffer/any. That would make the distinction clearer
...which comes with additional deserialization overhead
I'd like to clarify this in the sense that there is no additional overhead relative to if you are already using a BufferProvider and feeding it to _unstable. The additional overhead of BufferProvider is about the fact that you are using BufferProvider, not that you are using this constructor.
...which comes with additional dynamic dispatch overhead
There is no dynamic dispatch. We are downcasting a dyn Any to a concrete type, which requires checking the TypeId, but not traversing any vtables (I think). The code in the standard library is literally just
impl dyn Any {
#[inline]
pub fn is<T: Any>(&self) -> bool {
// Get `TypeId` of the type this function is instantiated with.
let t = TypeId::of::<T>();
// Get `TypeId` of the type in the trait object (`self`).
let concrete = self.type_id();
// Compare both `TypeId`s on equality.
t == concrete
}
#[inline]
pub fn downcast_ref<T: Any>(&self) -> Option<&T> {
if self.is::<T>() {
// SAFETY: just checked whether we are pointing to the correct type, and we can rely on
// that check for memory safety because we have implemented Any for all types; no other
// impls can exist as they would conflict with our impl.
unsafe { Some(self.downcast_ref_unchecked()) }
} else {
None
}
}
#[inline]
pub unsafe fn downcast_ref_unchecked<T: Any>(&self) -> &T {
debug_assert!(self.is::<T>());
// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }
}
}
I still think we should not automatically implement ResourceProvider for buffer/any. That would make the distinction clearer
I don't understand this sentence
- I assume you mean
DataProvider, notResourceProvider - We don't have blanket impls for any of the four main data provider traits. We always implement them on concrete types. This is why you need to call
.as_deserializing()or.as_downcasting(). - If you meant the other way around, the reason we implement AnyProvider on things that have many manual DataProvider impls (DatagenProvider and BakedDataProvider) is because the only way to implement AnyProvider on those types is to generate a big match statement, which requires knowledge of the set of keys in the provider, which is information that the maintainers have.
How about this:
foo_unstable
/// ... main docs ...
///
/// # Stability
///
/// The bounds of this function may not remain stable across minor releases. [Details](icu_provider)
///
/// # Examples
///
/// (an example using _with_buffer_provider)
foo_with_buffer_provider
/// Create a new instance using an [`BufferProvider`].
///
/// For more information on the behavior of this function, see [`Self::foo_unstable`].
///
/// For more information on the ICU4X constructor structure, see [`icu_provider`].
foo_with_any_provider (same as above except for link to AnyProvider)
Yeah it's not technically dynamic dispatch, but still it runtime-dispatches between many match cases, which has a cost, and then has some Any overhead as well.
I'd like to clarify this in the sense that there is no additional overhead relative to if you are already using a BufferProvider and feeding it to _unstable.
I'm aware. That's why I'm saying that e.g. BlobDataProvider should not implement DataProvider directly. Users can call as_deserializing() to pass it to _unstable, while having the deserialization and it's overhead explicit.
If you meant the other way around, the reason we implement AnyProvider on things that have many manual DataProvider impls (DatagenProvider and BakedDataProvider) is because the only way to implement AnyProvider on those types is to generate a big match statement, which requires knowledge of the set of keys in the provider, which is information that the maintainers have.
Implementing AnyProvider for something that's inherently a DataProvider seems fine, just the other way around shouldn't be done (and we're not doing it). If the DataProvider implementation delegates to the AnyProvider implementation, we shouldn't hide this from the user.
To clarify I'm advocating against the icu_provider::impl_auto_deserializing! macro.
@zbraniecki Can you weigh in here?
What is the ask here? I'm using the new constructor signatures right now writing perf harness and find it unintuitive, but I'm not sure how it translates to a question of using a macro or not.
My STR is:
- Go to API docs for
DateTimeFormatter - Look for typical constructor (
::new,::try_new,::default) -> doesn't exist - Instead I see four
try_new_*constructors. Two with scary words like "unstable", "experimental", two withanyvsbufferprovider - I assume I understand the need for provider, but am unfamiliar with differences between providers. - It prompts me to scroll to the top and look at the example, just to copy&paste something and start using it before I dive into learning about different providers and how to chose the right one for me.
- The example at the top uses
try_new_unstable. Fine, let me try it. - But
try_new_unstabledoesn't work with theStaticDataProviderthat I have. -> I'm confused - The docs for
StaticDataProviderdo not come with an example that would help me understand how to use it inDateTimeFormatter. - I'm asking around and learning that for
StaticDataProviderI should usetry_new_with_buffer_provider - I try and get compiler telling me that no such method exists and maybe I want to use
try_new_with_any_provider. Maybe I do? I don't know! Buttry_new_with_buffer_provideris in the API docs, so why it doesn't work? I check spelling multiple times. - Eventually I look into the source and find out I need
features = ["serde"]. - Finally it works.
- I then go to find where can I read about different providers. It's not linked from every provider/formatter so I have to find it.
Unpacking it, I see several potential DX improvements:
- Since constructing any formatter requires a decision on provider, each formatter's entry API doc should have a section on "selecting provider" with a link to
icu_providerentry docs. I now see thatDateTimeFormatpage has a sentence with a link, but it doesn't call out the requirement of the user to choose a provider, which makes it less likely I'll notice it and read through. It also doesn't help that the link doesn't work. - Reconsider naming -
try_new_unstablecould betry_new_with_versioned_providerbecause it is for versioned data that matches the version of ICU4X. - Either use in examples provider that can be then used in my code or explicitly call out that
testdataworks with one constructor, but all providers people will want to use in their work will require a different one. - Call out that the constructor requires
serdefeature. (I thought it'll be rustdoc feature to show that?) - Revisit
icu_providerentry doc since this is going to be linked from everywhere where anyone can stumble and has to make sense of how to chose the right provider. The doc should: 5.1. Have a short explainer on why we need to make it complicated, and why developer needs to make conscious decisions. The value proposition of having such sophisticated data provider. It can start with acknowledgement that Unicode's experience with usage of ICU led us to realize that data management is the most critical aspect of deploying i18n and requires high level of customization for the needs of the platform it is embedded in. In result ICU4X comes with a selection of providers that should allow for ICU4X to naturally fit into different business and technological needs of customers. 5.2. Provide some decision chart, maybe similar tostd::collections5.3. For each type of provider in its main doc show an example of usage of that provider in a component. (to avoid poor souls trying to maketry_new_unstablework withStaticDataProvideretc.) 5.4. Fix the link toBakedDataProviderinicu_providermain doc.
This is excellent feedback. I'll work on a PR to address the pain points you found.