icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Fix consistency of from_bytes, try_from_bytes, from_str, try_from_str

Open sffc opened this issue 1 year ago • 20 comments

In our docs there are 95 and 63 hits for from_bytes and try_from_bytes (the former includes counts from the latter), and there's also a mix of from_str and try_from_str.

We should try to fix this in 2.0. I suggest:

  • By default, fallible conversions (they are usually fallible) should have try_from_bytes and FromStr::from_str. In the cases where they are not fallible, from_bytes is fine.
  • Special cases can be discussed when they are encountered.

Alternatively, we could rename everything to try_from_utf8 instead of try_from_bytes.

EDIT: I think my preference has changed to also include try_from_str constructors. See discussion.

Feedback needed from:

  • [x] @Manishearth
  • [ ] @hsivonen
  • [ ] @zbraniecki
  • [ ] @robertbastian

sffc avatar May 23 '24 03:05 sffc

I think FromStr::from_str should dictate naming. It is not an ideal name, but it's the name Rust uses, and we should follow convention.

I like the TinyAsciiStr model: it has a FromStr::from_str, a pub const fn from_str(s: &str) -> Result<Self>, which shadows the trait method, and an analogous from_bytes. Having an inherent from_str shadowing the trait is good for two reasons: it can be const, which the trait method cannot be, and it can be called without importing FromStr (which is not in the prelude). I think this method should not be called try_from_str, because it is FromStr::from_str. Giving it a second name is confusing, and it removes the shadow, so callers would have to either use the different name, or import core::str::FromStr.

So my suggestion is:

  • Types with fallible conversions
    • Implement (const) fn from_bytes(&[u8]) -> Result<Self, E> and FromStr
    • If conversion can be const, add an inherent const fn from_str
    • Maybe add an inherent fn from_str anyway, for calling convenience
  • Types with infallible conversions
    • Implement (const) fn from_str(&str) -> Self, (const) fn from_bytes(&[u8]) -> Self
    • ~Maybe implement FromStr with Err = Infallible~
  • Types will either have fallible or infallible conversions, so these won't clash

These methods would have to be renamed to follow that scheme:

  • icu::locid::{lots of types}::try_from_bytes (2.0-breaking)
  • icu::timezone::{CustomTimeZone, GmtOffset}::try_from_bytes (2.0-breaking)
  • zerovec::ZeroSlice::try_from_bytes (util)

These methods should be FromStr::from_str anyway, will send a PR:

  • icu::experimental::relativetime::provider::SingularSubPattern::try_from_str (experimental, provider)
  • icu_pattern::Pattern::try_from_str (util, unreleased)

robertbastian avatar May 23 '24 14:05 robertbastian

I roughly agree with Robert's stance with the caveat that I'm not a huge fan of having inherent methods named the same as trait methods that are implemented by the type (so the "name something the same to get it const" is not that great, for me)

Manishearth avatar May 23 '24 16:05 Manishearth

I strongly prefer using try_ for fallible non-trait methods.

It is not an ideal name, but it's the name Rust uses, and we should follow convention.

I consider FromStr to be a bit legacy not only because of the lack of try_ but also because its error type is called Err instead of Error as used in newer traits like TryFrom, whose convention I prefer to follow.

Also, people don't usually call FromStr::from_str directly because (1) it's not in the prelude and (2) people use .parse(). I see it as a means to an end to be more consistent with the Rust ecosystem, but we should define our own constructors on top of impl FromStr.

Anecdotally, I have found it misleading when the only constructor for various ICU4X types is impl FromStr. I would rather look at the docs and see a try_from_... and then I instantly know that's how to construct the thing.

const fn from_str is interesting; I hadn't thought of that case in the OP. I think I prefer try_from_str because the function shows up in our docs as a constructor.

sffc avatar May 24 '24 07:05 sffc

with the caveat that I'm not a huge fan of having inherent methods named the same as trait methods that are implemented by the type

@Manishearth If you hold this position, should we reconsider https://github.com/unicode-org/icu4x/issues/4590?

sffc avatar May 24 '24 07:05 sffc

(const) fn from_str(&str) -> Self

This is problematic because the signature is different than the trait function FromStr::from_str. I think if a function shadows, it should have the same signature.

How about this revised proposal:

  • Fallible String Constructors:
    • impl FromStr
    • pub fn try_from_bytes -- const if possible
    • pub fn try_from_str for discoverability -- const if possible.
  • Infallible String Constructors:
    • impl FromStr with type Err = Infallible
    • pub from from_bytes -- const if possible
    • Do not implement from_str because the signature is different than the trait function. May optionally consider a name such as new_from_str or from_id or something.

sffc avatar May 24 '24 07:05 sffc

(const) fn from_str(&str) -> Self

This is problematic because the signature is different than the trait function FromStr::from_str. I think if a function shadows, it should have the same signature.

I'm fine not implementing FromStr in the infallible case.

robertbastian avatar May 24 '24 08:05 robertbastian

Actually I'm wholly against FromStr for infallible. from_str should be an inherent method, and it should not implement the trait method because the signature is different than the inherent function, we don't have infallible unwrapping (so this will be annoying to use), and there is usually no "parsing", just rewrapping (cf UnvalidatedStr), so this being available through str::parse is weird.

Also, people don't usually call FromStr::from_str directly because (1) it's not in the prelude and (2) people use .parse(). I see it as a means to an end to be more consistent with the Rust ecosystem, but we should define our own constructors on top of impl FromStr.

We actually use from_str in our docs a lot. From these three declarations, I vastly prefer the first:

  • let foo = Foo::from_str("foo")?;
  • let foo = "foo".parse::<Foo>()?;
  • let foo: Foo = "foo".parse()?;

robertbastian avatar May 24 '24 09:05 robertbastian

I think try_from_utf8 is slightly better than try_from_bytes as it makes a normalized API space for utf8/utf16 (which is what we'll want in some hot paths like Locale parsing) and aligns with https://doc.rust-lang.org/std/str/fn.from_utf8.html .

Suggestion:

  • try_from_utf8 / try_from_utf16 + TryFrom<&[u8]> / TryFrom<&[u16]>
  • from_utf8 / from_utf16 + From<&[u8]> / From<&[u16]>
  • FromStr + TryFrom<&str>/From<&str>

We don't need to implement all of them for all constructors - just normalize the naming scheme so that we can introduce the ones we find useful incrementally.

zbraniecki avatar May 24 '24 10:05 zbraniecki

+1 on using _utf8 instead of _bytes.

robertbastian avatar May 24 '24 17:05 robertbastian

@Manishearth If you hold this position, should we reconsider #4590?

It's a very weakly held position, I think #4590 is okay. I'm also fine with the proposal here for similar reasons.

Manishearth avatar May 24 '24 18:05 Manishearth

I'm working on locid now in context of icu_preferences, and once I'm done I'd like to unify and improve docs on handling of to/from u8/u16/str for all subtags for preparation for locid &[u16] handling.

Any thoughts on my proposal several comments above?

zbraniecki avatar May 29 '24 06:05 zbraniecki

  • @hsivonen - I think the try prefix makes sense when fallible. I think utf8 makes sense for consistency with the standard library and pairing with utf16.
  • @robertbastian - I think, from a purity point, it should be utf8, but it seems like it would make it less discoverable.
  • @robertbastian not a fan of duplicating methods, try_from_str would be the same method as from_str
  • @sffc i think FromStr is nice to have. I don't actually like it much since it's not in the prelude and it doesn't have the try prefix
  • @robertbastian i agree from_str is bad. we should use it as little as possible, and this includes not using it in examples. I would be happy with this if we always used .parse or try_from_str but never from_str.
  • @sffc also don't like that it's undiscoverable
  • @manishearth: would really like rust to allow you to "promote" trait methods to inherent (in docs and in resolution behavior)
  • @sffc - I'm a bit worried about making utf16 a requirement; it seems like it would be an implementation challenge that isn't jusatified all the time
  • @robertbastian - I don't think it needs to be a requirement; we should add it if we want to
  • @nekevss - It might be easy sometimes but it depends on the architecture of the parse function
  • @sffc - Should we include TryFrom<[u8]> as @zbraniecki suggested?
  • @robertbastian - It's not clear from such an impl that it expects utf-8. Also, try_into is not a very ergonomic method, so if it's called as Foo::try_from(&[u8]), they should just call try_from_utf8
  • @sffc - TryFrom<[u8]> implies that there is one way to build this from a [u8], but we have multiple ways, such as Postcard deserialization.
  • @robertbastian - FromStr gives us parse, which some users might expect, what does TryFrom<[u8]> give us? I don't see a use case for using it generically
  • @hsivonen - Is it useful for using generically, like if you have potentially ill-formed UTF-8 or UTF-16?
  • @sffc - In such a case, we should implement TryFrom<PotentialUtf8>. We can add this later, perhaps.

Concrete proposal:

All types that are fallibly created from a string have the following functions:

  • try_from_str -> Result<Self>
  • try_from_utf8 -> Result<Self>
  • FromStr::from_str -> Result<Self> (only ever called through parse in documentation, but only if it can be done without turbofishes, otherwise Foo::try_from_str)
  • Optional: try_from_utf16 -> Result<Self> (only if the impl would benefit from such a function)

All types that are infallibly created from a string have the following functions:

  • from_str -> Self
  • from_utf8 -> Self
  • Optional: from_utf16 -> Self (only if the impl would benefit from such a function)

LGTM: @hsivonen @sffc @Manishearth @robertbastian

sffc avatar May 30 '24 17:05 sffc

I didn't realize I signed off on the "(only ever called through parse in documentation, but only if it can be done without turbofishes, otherwise Foo::try_from_str)" -- I agree with the "only ever called through parse in documentation" but not necessarily the "only if it can be done without turbofishes"

sffc avatar Jun 04 '24 04:06 sffc

LGTM with the same alternation as @sffc pointed ou in the last comment. I'm not convinced that turbofish specifier is bad enough to have a blank rule against it in docs.

zbraniecki avatar Jun 04 '24 05:06 zbraniecki

What about this pattern:

pub struct Era(pub TinyStr16);

impl From<TinyStr16> for Era {
    fn from(x: TinyStr16) -> Self {
        Self(x)
    }
}

impl FromStr for Era {
    type Err = <TinyStr16 as FromStr>::Err;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        s.parse().map(Self)
    }
}

Both the impl From<TinyStr> as well as the impl FromStr are kind of pointless, as TinyStr already has try_from_utf8/try_from_utf16/try_from_str/from_str, and the field is public.

robertbastian avatar Jun 19 '24 13:06 robertbastian

This is done for all stable components and utils, except for the datetime and plurals reference and runtime modules.

robertbastian avatar Sep 17 '24 08:09 robertbastian

@zbraniecki WDYT about doing the facelift to those modules?

sffc avatar Sep 17 '24 17:09 sffc

Also in 2.0 (can be stretch), apply this style to all stringy APIs such as canonicalize, normalize, ...

sffc avatar Sep 17 '24 18:09 sffc

I think we should align them with the decision here.

zbraniecki avatar Sep 17 '24 19:09 zbraniecki

  • @sffc I think the reason I had this on the agenda was that I think we should fix all functions that take strings, not just try_from functions
  • @robertbastian - yeah some things like normalize accept AsRef<[u8]> to be generic over &[u8] and &str, but that's not great, both documentation-wise, and because the lifetime gets lost through AsRef
  • @sffc And how about the plurals::reference and datetime::reference modules?
  • @robertbastian If they are truly datagen-only, I don't think we should touch them
  • @sffc They are doc(hidden) currently, so let's hold off until we have a decision on #5181
  • @sffc Is it normalize or normalize_str? Maybe we say that we do _str only for from functions or functions that could take other types of arguments

Conclusion:

  • Update all string-accepting functions to be foo(&str), foo_utf8(&[u8]), foo_utf16(&[u16])
  • Don't touch doc(hidden) modules at this time

LGTM: @sffc (@robertbastian) @Manishearth

sffc avatar Oct 03 '24 18:10 sffc

@robertbastian How much of this is done? Does this just need a final audit, or are there pieces that still need doing?

Manishearth avatar Jan 07 '25 23:01 Manishearth

No change since the last update from me.

robertbastian avatar Jan 08 '25 09:01 robertbastian

Blocked on #5181

robertbastian avatar Feb 11 '25 12:02 robertbastian

@robertbastian is that the only case? I'd say we can close this issue then.

Manishearth avatar Feb 12 '25 13:02 Manishearth

Again, https://github.com/unicode-org/icu4x/issues/4931#issuecomment-2354860616

robertbastian avatar Feb 12 '25 13:02 robertbastian

datetime::pattern::reference is now datetime::provider::pattern::reference (an unstable module)

sffc avatar Feb 12 '25 13:02 sffc

Assuming https://github.com/unicode-org/icu4x/pull/6103, I think we can close this out once that lands.

Manishearth avatar Feb 12 '25 13:02 Manishearth

That landed, nothing left is in stable code.

Manishearth avatar Feb 14 '25 13:02 Manishearth