icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Reconsider name of unstable constructors

Open sffc opened this issue 3 years ago • 16 comments
trafficstars

@zbraniecki suggested that we could rename our unstable constructors from _unstable to _with_versioned_provider.

I'd like to discuss a few options:

  1. _unstable
  2. _with_versioned_provider
  3. _with_unstable_provider

I lean toward (3). The issue is that we are already on shaky ground making these functions public since they don't obey SemVer guarantees, so we should have a scary word in the constructor. However, I would like the consistency of naming them alongside _with_any_provider and _with_buffer_provider.

OK with _with_unstable_provider?

  • [ ] @zbraniecki
  • [ ] @Manishearth
  • [ ] @robertbastian

sffc avatar Sep 17 '22 01:09 sffc

In my comment I made an argument that unstable has a negative connotation that is not intended - it makes it sound like you're working with something that may change from under you.

This method is intended to say that it expects a data provider using the same version as the code, not that it is temporary and subject to API changes at any point.

Therefore I prefer with_versioned_provider.

If you decide to go with unstable I'd recommend documenting over each constructor what unstable means in this case very clearly.

zbraniecki avatar Sep 17 '22 01:09 zbraniecki

it makes it sound like you're working with something that may change from under you.

But that's exactly what the intent is. The function bound may change at any time, which could break code.

I tried to improve the docs for this in #2573; can you give feedback over there to land on verbiage first, and then revisit the naming?

sffc avatar Sep 17 '22 02:09 sffc

Slightly confused about the seeming contradiction between

  • option 2 = _with_versioned_provider + “I lean toward (2).”
  • “OK with _with_unstable_provider?”

I agree that _with_xyz_provider is more consistent than just _unstable.

Shane's arguments for _with_unstable_provider make sense to me.

All of these are parameterized on a v1 provider type, right? You could emphasize the instability by adding the v1 into the function name, such as _with_v1_provider or _with_unstable_v1_provider. When it changes, it's both the type parameter and the function name.

markusicu avatar Sep 17 '22 03:09 markusicu

Hmm, _with_unstable_provider makes it sound like it's requesting some kind of UnstableProvider, whereas it's not the provider that's unstable, it's the function that is. Not sure what it should be, I find the status quo pretty ok.

Manishearth avatar Sep 18 '22 23:09 Manishearth

Slightly confused about the seeming contradiction between

  • option 2 = _with_versioned_provider + “I lean toward (2).”
  • “OK with _with_unstable_provider?”

Typo, sorry. I lean toward (3).

All of these are parameterized on a v1 provider type, right? You could emphasize the instability by adding the v1 into the function name, such as _with_v1_provider

Yes. The catch is that the old _with_v1_provider doesn't have a strong use case on its own. Anyone calling _with_v1_provider may as well call _with_any_provider. The _unstable constructor always selects the minimal latest data required, which is its value proposition.

sffc avatar Sep 18 '22 23:09 sffc

with_provider_unstable

with_data_provider_unstable

robertbastian avatar Sep 19 '22 17:09 robertbastian

Looks like it's time for another ranked choice voting :)

https://docs.google.com/document/d/1AXm2f9Ngq06ESPA02DMbkq0su8s-U6CzE03CM1HuXl0/edit#

sffc avatar Sep 19 '22 19:09 sffc

My public appeal - the term "unstable" is meaningful to us because we understand what is unstable about the operation. It is meaningless, confusing and scary to the user of the API.

It is meaningless because it is not communicating information that majority of the users of the API will retrieve from the name. I believe more than 9/10 potential users of our API (non-i18n experts, OS/app developers who look to internationalize their app) will not be able to answer the question what "unstable" in this API name means correctly.

In itself, that might be ok - we could claim we want to put the user "on guard" and make them pay attention to docs to understand what they're doing. But then, it's also confusing because unstable exists in stdlib in form of slice::sort_unstable where it means that the output may not preserve order. From docs:

This sort is unstable (i.e., may reorder equal elements)

Our unstable means something very different, of different significance and considerations. Out of potential users of our API who may think they know what the word in the API name means, I believe they will try to conjure and answer based on the stdlib definition.

Finally, it's scary which means it builds a subliminal sense of discomfort from using the API. My mental experiment is that I believe if you ask someone to rate "how enjoyable using ICU4X is" before and after encountering this API name, the avg response value will drop.

For the last one, I have a grievance with it overall - our API is growing in words that repulse people from using it and makes working with our API feel like a minefield of hermetic nuances that we force customers to navigate through.

One of the core values of ECMA-402 API is that it hides complexity - it enables non-i18n, non-complex-data-management-experts, non-types-experts developers to use I18n API, mostly do the right thing on the first try, and in result report high satisfaction. One of the core values of Rust is that it scales complexity. You can do Vec::sort and it'll do the right thing for most cases. It won't be the fastest, or the leanest, but it'll work. You can f64 FromStr and ToString and it'll do the right thing. There will be more specialized, longer named versions and great docs both in convenient, specialized and summary position to help customers who want to read to educate themselves and make the right choice, but there are few (none?) cases where a core operation for a given component places user in front of try_new_with_fazoo_baboozle and try_new_experimental_unstable_mantisa and try_new_with_any_anomolouser and asked to read detailed doc on a complex subject before you can use the API.

The ICU4X today is unable to be used without understanding details on different providers, type system, calendar information (What's ISO calendar?) and float precisions. I am interested in filing a broader issue depending on the outcome of this conversation because the constructor of our components is at the core of what every user will have to jump through to use ICU4X and if the group will decide that I'm wrong and having specialized names is the right way to go, then I'll disagree and commit. :)

zbraniecki avatar Sep 19 '22 21:09 zbraniecki

But then, it's also confusing because unstable exists in stdlib in form of slice::sort_unstable where it means that the output may not preserve order

FWIW I consider this a special case where "unstable" as a term of art exists for that specific operation already; the Rust community to me seems to generally use "unstable" rather consistently otherwise.

I feel like we can sufficiently document this as "use the other two unless you need to care about codesize". I think it's fine for a library to have some scarier, more advanced APIs that most people are not expected to use. I think we need to be clear about it in the docs and make it easy to understand without having to read that whole page.

Perhaps the name should instead be unstable_try_new() so that it's not visually similar to the other two.

It's worth noting; having unstable APIs at all is a big no-no in the Rust space. We're doing it because we have good reason to; I think the least we can do is mark it clearly in the name.

The ICU4X today is unable to be used without understanding details on different providers, type system, calendar information (What's ISO calendar?) and float precisions.

I disagree with the comment on calendars; the default DTF works without people needing to understand calendars, though the docs do mention them a bunch. I think from an API design perspective it's pretty sound; from a docs perspective we could do more work.

I do agree with the sentiment in general, though. I think since data loading is such a core value proposition the any/buffer distinction will be something people need to deal with, but we can perhaps limit the perceived complexity there. I do like the idea of making it unstable_try_new() instead so it no longer looks visually a part of the same group.

Manishearth avatar Sep 19 '22 23:09 Manishearth

An unstable_ prefix might also help in that these functions should be grouped together in the docs, and alphabetically late.

Are all constructors always at the top? If not, consider explicitly moving scary stuff to appear last.

markusicu avatar Sep 19 '22 23:09 markusicu

Before I say anything further: have y'all read my new proposed documentation page comparing the three constructors and when you should use each one? All three have very clear use cases. I want to make sure we are on the same page.

https://github.com/unicode-org/icu4x/pull/2573

I'm very much in favor of making "easy" constructors with better ergonomics as long as they sit alongside the complicated ones that give you the best metrics. This has been core to my mental model on the structure of ICU4X APIs for a long time, going back to one of my foundational design docs about "wrapper layers".

With the data provider constructors, we're in an unfortunate situation where there's just no great way to make an easy constructor. I've posted multiple issues and driven multiple discussions in the past on this issue. The current group consensus is to move forward with the 3 copies of each constructor in Rust and keep the door open to improve on this in 2.0 and beyond.

I feel like we can sufficiently document this as "use the other two unless you need to care about codesize". I think it's fine for a library to have some scarier, more advanced APIs that most people are not expected to use. I think we need to be clear about it in the docs and make it easy to understand without having to read that whole page.

Two things here:

  1. There are use cases for the unstable constructor: crabbake, and things with blanket impls (like data caches). It's not just something to use for code size, although it does benefit code size.
  2. The PR linked above, which you reviewed, tries to address this. If you have concrete suggestions about phrasing, please leave them in the PR, #2573.

One more thing that's on my mind: many of these problems would go away if we could only use DataProvider constructors and not BufferProvider/AnyProvider constructors. We could achieve this by having try_new_unstable and try_new_v1. We decided against this previously, because we thought it was more bad than the other bad solution we chose, but maybe it's worth revisiting since I think it's likely that we're going to continue hearing a chorus of confusion over the explicit buffer/any stuff.

sffc avatar Sep 19 '22 23:09 sffc

To elaborate on my last paragraph, we could have done it this way:

  1. try_new_latest (unstable)
  2. try_new_1 (stable with DataProvider bounds)

A downside here is that people need to explicitly upgrade their call sites from try_new_1 to try_new_2 when a new version comes out, or else they will be stuck with old data and old behavior.

sffc avatar Sep 19 '22 23:09 sffc

I disagree with the comment on calendars; the default DTF works without people needing to understand calendars,

How does one create a new Date or DateTime without understanding what ISO is or knowing what Gregorian is?

JS:

let date = new Date("2022-09-22");
let dtf = new Intl.DateTimeFormat("en");
dtf.format(date);

We can dive into details on calendar-defaultism, and Date in JS has its own set of papercuts, but the API is clean and as easy as it goes. You can even do:

let date = new Date("2022-09-22");
date.toLocaleString("en");

to get the same. No mentions of cryptic names, no scary words, no forced implementation detail choices.

I don't suggest we go all the way there, but I do think there's a value in it for the DX. The mental model I was taught is that using API/syntax to force users to make educated choices doesn't accomplish the goal and it lowers the DX.

have y'all read my new proposed documentation page comparing the three constructors and when you should use each one?

I have. It does a great job at alleviating the issue introduced by the API naming schemes. If we stick to elaborate names, then that's as far as we can go to fix it. I appreciate you putting time to improve the docs!

zbraniecki avatar Sep 20 '22 00:09 zbraniecki

How does one create a new Date or DateTime without understanding what ISO is or knowing what Gregorian is?

Once we have the full time zone stack finished (which I was really hoping to see in 1.0 but didn't make it), you will be able to pass a timestamp into ZonedDateTimeFormatter without needing to explicitly acknowledge the calendar system, just like in classic ECMAScript and in Temporal.Absolute. That's your all-in-one, super-easy API.

However, I would note that the Temporal champions agreed that you cannot reason about plain dates without acknowledging either ISO or a local calendar system. I assume you're familiar with calendar-review.md.

sffc avatar Sep 20 '22 00:09 sffc

If you have concrete suggestions about phrasing, please leave them in the PR

Oh, to be clear I think that PR does things well; I was just highlighting the need for it :smile:

I think your doc page plus potentially moving _unstable to the front of the function name might solve this issue sufficiently well. We can also move it to the bottom and if people care they can find it after learning about the existence of such ctors from your ctor docs.

@zbraniecki

How does one create a new Date or DateTime without understanding what ISO is or knowing what Gregorian is?

Ah, I forgot about that part. The formatting is decoupled, though.

@markusicu

Are all constructors always at the top? If not, consider explicitly moving scary stuff to appear last.

Nope, they're in source order (mostly, handwave handwave).

Manishearth avatar Sep 20 '22 03:09 Manishearth

"unstable" can mean many very different things of different scariness. If we mean semver-unstable, let's have "semver" in some form in the name.

hsivonen avatar Sep 21 '22 08:09 hsivonen

So far there is no clear winner from the ranked choice voting. No choice has an average rank higher than 3, and every choice has at least 1 veto. Option 1 (try_new_unstable) has the most 1st choice votes and also wins according to traditional ranked choice voting rules. However, this pulls from a pool of only 5 votes.

I'm going to assume we keep the status quo unless a clear winner emerges before we stamp the 1.0 release. If we decide on a better name later, we can always #[doc(hidden)] the old one as an alias to keep stability.

"unstable" can mean many very different things of different scariness. If we mean semver-unstable, let's have "semver" in some form in the name.

I would prefer to not invent new language. A quick search of docs.rs doesn't appear to pull up precedent of "no_semver" in method names. I think "unstable" is fine; it means you need to read the docs for why it is unstable, like "unsafe" means you need to read the docs for why it is unsafe. We have the functions all clearly marked now with a yellow box explaining why they are unstable.

sffc avatar Sep 25 '22 06:09 sffc