icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Make docs for the 3 data provider constructors super clear

Open sffc opened this issue 3 years ago • 8 comments

From @zbraniecki and @robertbastian

The three types of provider constructors are documented here:

https://unicode-org.github.io/icu4x-docs/doc/icu_provider/index.html#icu4x-constructors-and-data-versioning-policy

I have a macro that generates the _with_any_provider and _with_buffer_provider constructors. That macro should include more details in the doc strings of those generated functions and point to the full documentation page linked above.

We may also want to include boilerplate on the _unsafe constructor, but this can't be auto-generated as easily.

@zbraniecki and/or @robertbastian, would you like to suggest some prose for this?

sffc avatar Aug 05 '22 15:08 sffc

Roughly:

_unstable: This function is unstable in the way that its data provider bounds are not guaranteed to be stable across even minor releases. If you don't need this stability (i.e. because you're using BakedDataProvider), prefer this as it has the best performance. Otherwise see buffer/any.

_buffer: This version of this function accepts BufferProvider, which comes with additional deserialization overhead. However, it is guaranteed to work with any data provider generated for this major release.

_any: This version of this function accepts AnyProvider, which comes with additional dynamic dispatch overhead. However, it is guaranteed to work with any data provider generated for this major release.

I still think we should not automatically implement ResourceProvider for buffer/any. That would make the distinction clearer

robertbastian avatar Aug 05 '22 18:08 robertbastian

...which comes with additional deserialization overhead

I'd like to clarify this in the sense that there is no additional overhead relative to if you are already using a BufferProvider and feeding it to _unstable. The additional overhead of BufferProvider is about the fact that you are using BufferProvider, not that you are using this constructor.

...which comes with additional dynamic dispatch overhead

There is no dynamic dispatch. We are downcasting a dyn Any to a concrete type, which requires checking the TypeId, but not traversing any vtables (I think). The code in the standard library is literally just

impl dyn Any {
    #[inline]
    pub fn is<T: Any>(&self) -> bool {
        // Get `TypeId` of the type this function is instantiated with.
        let t = TypeId::of::<T>();

        // Get `TypeId` of the type in the trait object (`self`).
        let concrete = self.type_id();

        // Compare both `TypeId`s on equality.
        t == concrete
    }

    #[inline]
    pub fn downcast_ref<T: Any>(&self) -> Option<&T> {
        if self.is::<T>() {
            // SAFETY: just checked whether we are pointing to the correct type, and we can rely on
            // that check for memory safety because we have implemented Any for all types; no other
            // impls can exist as they would conflict with our impl.
            unsafe { Some(self.downcast_ref_unchecked()) }
        } else {
            None
        }
    }

    #[inline]
    pub unsafe fn downcast_ref_unchecked<T: Any>(&self) -> &T {
        debug_assert!(self.is::<T>());
        // SAFETY: caller guarantees that T is the correct type
        unsafe { &*(self as *const dyn Any as *const T) }
    }
}

I still think we should not automatically implement ResourceProvider for buffer/any. That would make the distinction clearer

I don't understand this sentence

  • I assume you mean DataProvider, not ResourceProvider
  • We don't have blanket impls for any of the four main data provider traits. We always implement them on concrete types. This is why you need to call .as_deserializing() or .as_downcasting().
  • If you meant the other way around, the reason we implement AnyProvider on things that have many manual DataProvider impls (DatagenProvider and BakedDataProvider) is because the only way to implement AnyProvider on those types is to generate a big match statement, which requires knowledge of the set of keys in the provider, which is information that the maintainers have.

sffc avatar Aug 06 '22 06:08 sffc

How about this:

foo_unstable

/// ... main docs ...
///
/// # Stability
///
/// The bounds of this function may not remain stable across minor releases. [Details](icu_provider)
///
/// # Examples
///
/// (an example using _with_buffer_provider)

foo_with_buffer_provider

/// Create a new instance using an [`BufferProvider`].
///
/// For more information on the behavior of this function, see [`Self::foo_unstable`].
///
/// For more information on the ICU4X constructor structure, see [`icu_provider`].

foo_with_any_provider (same as above except for link to AnyProvider)

sffc avatar Aug 06 '22 06:08 sffc

Yeah it's not technically dynamic dispatch, but still it runtime-dispatches between many match cases, which has a cost, and then has some Any overhead as well.

I'd like to clarify this in the sense that there is no additional overhead relative to if you are already using a BufferProvider and feeding it to _unstable.

I'm aware. That's why I'm saying that e.g. BlobDataProvider should not implement DataProvider directly. Users can call as_deserializing() to pass it to _unstable, while having the deserialization and it's overhead explicit.

If you meant the other way around, the reason we implement AnyProvider on things that have many manual DataProvider impls (DatagenProvider and BakedDataProvider) is because the only way to implement AnyProvider on those types is to generate a big match statement, which requires knowledge of the set of keys in the provider, which is information that the maintainers have.

Implementing AnyProvider for something that's inherently a DataProvider seems fine, just the other way around shouldn't be done (and we're not doing it). If the DataProvider implementation delegates to the AnyProvider implementation, we shouldn't hide this from the user.

robertbastian avatar Aug 10 '22 07:08 robertbastian

To clarify I'm advocating against the icu_provider::impl_auto_deserializing! macro.

robertbastian avatar Aug 10 '22 07:08 robertbastian

@zbraniecki Can you weigh in here?

sffc avatar Aug 19 '22 07:08 sffc

What is the ask here? I'm using the new constructor signatures right now writing perf harness and find it unintuitive, but I'm not sure how it translates to a question of using a macro or not.

My STR is:

  1. Go to API docs for DateTimeFormatter
  2. Look for typical constructor (::new, ::try_new, ::default) -> doesn't exist
  3. Instead I see four try_new_* constructors. Two with scary words like "unstable", "experimental", two with any vs buffer provider - I assume I understand the need for provider, but am unfamiliar with differences between providers.
  4. It prompts me to scroll to the top and look at the example, just to copy&paste something and start using it before I dive into learning about different providers and how to chose the right one for me.
  5. The example at the top uses try_new_unstable. Fine, let me try it.
  6. But try_new_unstable doesn't work with the StaticDataProvider that I have. -> I'm confused
  7. The docs for StaticDataProvider do not come with an example that would help me understand how to use it in DateTimeFormatter.
  8. I'm asking around and learning that for StaticDataProvider I should use try_new_with_buffer_provider
  9. I try and get compiler telling me that no such method exists and maybe I want to use try_new_with_any_provider. Maybe I do? I don't know! But try_new_with_buffer_provider is in the API docs, so why it doesn't work? I check spelling multiple times.
  10. Eventually I look into the source and find out I need features = ["serde"].
  11. Finally it works.
  12. I then go to find where can I read about different providers. It's not linked from every provider/formatter so I have to find it.

Unpacking it, I see several potential DX improvements:

  1. Since constructing any formatter requires a decision on provider, each formatter's entry API doc should have a section on "selecting provider" with a link to icu_provider entry docs. I now see that DateTimeFormat page has a sentence with a link, but it doesn't call out the requirement of the user to choose a provider, which makes it less likely I'll notice it and read through. It also doesn't help that the link doesn't work.
  2. Reconsider naming - try_new_unstable could be try_new_with_versioned_provider because it is for versioned data that matches the version of ICU4X.
  3. Either use in examples provider that can be then used in my code or explicitly call out that testdata works with one constructor, but all providers people will want to use in their work will require a different one.
  4. Call out that the constructor requires serde feature. (I thought it'll be rustdoc feature to show that?)
  5. Revisit icu_provider entry doc since this is going to be linked from everywhere where anyone can stumble and has to make sense of how to chose the right provider. The doc should: 5.1. Have a short explainer on why we need to make it complicated, and why developer needs to make conscious decisions. The value proposition of having such sophisticated data provider. It can start with acknowledgement that Unicode's experience with usage of ICU led us to realize that data management is the most critical aspect of deploying i18n and requires high level of customization for the needs of the platform it is embedded in. In result ICU4X comes with a selection of providers that should allow for ICU4X to naturally fit into different business and technological needs of customers. 5.2. Provide some decision chart, maybe similar to std::collections 5.3. For each type of provider in its main doc show an example of usage of that provider in a component. (to avoid poor souls trying to make try_new_unstable work with StaticDataProvider etc.) 5.4. Fix the link to BakedDataProvider in icu_provider main doc.

zbraniecki avatar Sep 16 '22 16:09 zbraniecki

This is excellent feedback. I'll work on a PR to address the pain points you found.

sffc avatar Sep 16 '22 16:09 sffc