icu4x
icu4x copied to clipboard
Fix consistency of from_bytes, try_from_bytes, from_str, try_from_str
In our docs there are 95 and 63 hits for from_bytes and try_from_bytes (the former includes counts from the latter), and there's also a mix of from_str and try_from_str.
We should try to fix this in 2.0. I suggest:
- By default, fallible conversions (they are usually fallible) should have
try_from_bytesandFromStr::from_str. In the cases where they are not fallible,from_bytesis fine. - Special cases can be discussed when they are encountered.
Alternatively, we could rename everything to try_from_utf8 instead of try_from_bytes.
EDIT: I think my preference has changed to also include try_from_str constructors. See discussion.
Feedback needed from:
- [x] @Manishearth
- [ ] @hsivonen
- [ ] @zbraniecki
- [ ] @robertbastian
I think FromStr::from_str should dictate naming. It is not an ideal name, but it's the name Rust uses, and we should follow convention.
I like the TinyAsciiStr model: it has a FromStr::from_str, a pub const fn from_str(s: &str) -> Result<Self>, which shadows the trait method, and an analogous from_bytes. Having an inherent from_str shadowing the trait is good for two reasons: it can be const, which the trait method cannot be, and it can be called without importing FromStr (which is not in the prelude). I think this method should not be called try_from_str, because it is FromStr::from_str. Giving it a second name is confusing, and it removes the shadow, so callers would have to either use the different name, or import core::str::FromStr.
So my suggestion is:
- Types with fallible conversions
- Implement
(const) fn from_bytes(&[u8]) -> Result<Self, E>andFromStr - If conversion can be const, add an inherent
const fn from_str - Maybe add an inherent
fn from_stranyway, for calling convenience
- Implement
- Types with infallible conversions
- Implement
(const) fn from_str(&str) -> Self,(const) fn from_bytes(&[u8]) -> Self - ~Maybe implement
FromStrwithErr = Infallible~
- Implement
- Types will either have fallible or infallible conversions, so these won't clash
These methods would have to be renamed to follow that scheme:
icu::locid::{lots of types}::try_from_bytes(2.0-breaking)icu::timezone::{CustomTimeZone, GmtOffset}::try_from_bytes(2.0-breaking)zerovec::ZeroSlice::try_from_bytes(util)
These methods should be FromStr::from_str anyway, will send a PR:
icu::experimental::relativetime::provider::SingularSubPattern::try_from_str(experimental, provider)icu_pattern::Pattern::try_from_str(util, unreleased)
I roughly agree with Robert's stance with the caveat that I'm not a huge fan of having inherent methods named the same as trait methods that are implemented by the type (so the "name something the same to get it const" is not that great, for me)
I strongly prefer using try_ for fallible non-trait methods.
It is not an ideal name, but it's the name Rust uses, and we should follow convention.
I consider FromStr to be a bit legacy not only because of the lack of try_ but also because its error type is called Err instead of Error as used in newer traits like TryFrom, whose convention I prefer to follow.
Also, people don't usually call FromStr::from_str directly because (1) it's not in the prelude and (2) people use .parse(). I see it as a means to an end to be more consistent with the Rust ecosystem, but we should define our own constructors on top of impl FromStr.
Anecdotally, I have found it misleading when the only constructor for various ICU4X types is impl FromStr. I would rather look at the docs and see a try_from_... and then I instantly know that's how to construct the thing.
const fn from_str is interesting; I hadn't thought of that case in the OP. I think I prefer try_from_str because the function shows up in our docs as a constructor.
with the caveat that I'm not a huge fan of having inherent methods named the same as trait methods that are implemented by the type
@Manishearth If you hold this position, should we reconsider https://github.com/unicode-org/icu4x/issues/4590?
(const) fn from_str(&str) -> Self
This is problematic because the signature is different than the trait function FromStr::from_str. I think if a function shadows, it should have the same signature.
How about this revised proposal:
- Fallible String Constructors:
impl FromStrpub fn try_from_bytes--constif possiblepub fn try_from_strfor discoverability --constif possible.
- Infallible String Constructors:
impl FromStrwithtype Err = Infalliblepub from from_bytes--constif possible- Do not implement
from_strbecause the signature is different than the trait function. May optionally consider a name such asnew_from_strorfrom_idor something.
(const) fn from_str(&str) -> Self
This is problematic because the signature is different than the trait function FromStr::from_str. I think if a function shadows, it should have the same signature.
I'm fine not implementing FromStr in the infallible case.
Actually I'm wholly against FromStr for infallible. from_str should be an inherent method, and it should not implement the trait method because the signature is different than the inherent function, we don't have infallible unwrapping (so this will be annoying to use), and there is usually no "parsing", just rewrapping (cf UnvalidatedStr), so this being available through str::parse is weird.
Also, people don't usually call FromStr::from_str directly because (1) it's not in the prelude and (2) people use .parse(). I see it as a means to an end to be more consistent with the Rust ecosystem, but we should define our own constructors on top of impl FromStr.
We actually use from_str in our docs a lot. From these three declarations, I vastly prefer the first:
let foo = Foo::from_str("foo")?;let foo = "foo".parse::<Foo>()?;let foo: Foo = "foo".parse()?;
I think try_from_utf8 is slightly better than try_from_bytes as it makes a normalized API space for utf8/utf16 (which is what we'll want in some hot paths like Locale parsing) and aligns with https://doc.rust-lang.org/std/str/fn.from_utf8.html .
Suggestion:
try_from_utf8/try_from_utf16+TryFrom<&[u8]>/TryFrom<&[u16]>from_utf8/from_utf16+From<&[u8]>/From<&[u16]>FromStr+TryFrom<&str>/From<&str>
We don't need to implement all of them for all constructors - just normalize the naming scheme so that we can introduce the ones we find useful incrementally.
+1 on using _utf8 instead of _bytes.
@Manishearth If you hold this position, should we reconsider #4590?
It's a very weakly held position, I think #4590 is okay. I'm also fine with the proposal here for similar reasons.
I'm working on locid now in context of icu_preferences, and once I'm done I'd like to unify and improve docs on handling of to/from u8/u16/str for all subtags for preparation for locid &[u16] handling.
Any thoughts on my proposal several comments above?
- @hsivonen - I think the
tryprefix makes sense when fallible. I thinkutf8makes sense for consistency with the standard library and pairing withutf16. - @robertbastian - I think, from a purity point, it should be
utf8, but it seems like it would make it less discoverable. - @robertbastian not a fan of duplicating methods, try_from_str would be the same method as from_str
- @sffc i think
FromStris nice to have. I don't actually like it much since it's not in the prelude and it doesn't have thetryprefix - @robertbastian i agree
from_stris bad. we should use it as little as possible, and this includes not using it in examples. I would be happy with this if we always used.parseortry_from_strbut neverfrom_str. - @sffc also don't like that it's undiscoverable
- @manishearth: would really like rust to allow you to "promote" trait methods to inherent (in docs and in resolution behavior)
- @sffc - I'm a bit worried about making
utf16a requirement; it seems like it would be an implementation challenge that isn't jusatified all the time - @robertbastian - I don't think it needs to be a requirement; we should add it if we want to
- @nekevss - It might be easy sometimes but it depends on the architecture of the parse function
- @sffc - Should we include
TryFrom<[u8]>as @zbraniecki suggested? - @robertbastian - It's not clear from such an impl that it expects utf-8. Also,
try_intois not a very ergonomic method, so if it's called asFoo::try_from(&[u8]), they should just calltry_from_utf8 - @sffc -
TryFrom<[u8]>implies that there is one way to build this from a[u8], but we have multiple ways, such as Postcard deserialization. - @robertbastian -
FromStrgives usparse, which some users might expect, what doesTryFrom<[u8]>give us? I don't see a use case for using it generically - @hsivonen - Is it useful for using generically, like if you have potentially ill-formed UTF-8 or UTF-16?
- @sffc - In such a case, we should implement
TryFrom<PotentialUtf8>. We can add this later, perhaps.
Concrete proposal:
All types that are fallibly created from a string have the following functions:
try_from_str -> Result<Self>try_from_utf8 -> Result<Self>FromStr::from_str -> Result<Self>(only ever called throughparsein documentation, but only if it can be done without turbofishes, otherwiseFoo::try_from_str)- Optional:
try_from_utf16 -> Result<Self>(only if the impl would benefit from such a function)
All types that are infallibly created from a string have the following functions:
from_str -> Selffrom_utf8 -> Self- Optional:
from_utf16 -> Self(only if the impl would benefit from such a function)
LGTM: @hsivonen @sffc @Manishearth @robertbastian
I didn't realize I signed off on the "(only ever called through parse in documentation, but only if it can be done without turbofishes, otherwise Foo::try_from_str)" -- I agree with the "only ever called through parse in documentation" but not necessarily the "only if it can be done without turbofishes"
LGTM with the same alternation as @sffc pointed ou in the last comment. I'm not convinced that turbofish specifier is bad enough to have a blank rule against it in docs.
What about this pattern:
pub struct Era(pub TinyStr16);
impl From<TinyStr16> for Era {
fn from(x: TinyStr16) -> Self {
Self(x)
}
}
impl FromStr for Era {
type Err = <TinyStr16 as FromStr>::Err;
fn from_str(s: &str) -> Result<Self, Self::Err> {
s.parse().map(Self)
}
}
Both the impl From<TinyStr> as well as the impl FromStr are kind of pointless, as TinyStr already has try_from_utf8/try_from_utf16/try_from_str/from_str, and the field is public.
This is done for all stable components and utils, except for the datetime and plurals reference and runtime modules.
@zbraniecki WDYT about doing the facelift to those modules?
Also in 2.0 (can be stretch), apply this style to all stringy APIs such as canonicalize, normalize, ...
I think we should align them with the decision here.
- @sffc I think the reason I had this on the agenda was that I think we should fix all functions that take strings, not just
try_fromfunctions - @robertbastian - yeah some things like
normalizeacceptAsRef<[u8]>to be generic over&[u8]and&str, but that's not great, both documentation-wise, and because the lifetime gets lost throughAsRef - @sffc And how about the
plurals::referenceanddatetime::referencemodules? - @robertbastian If they are truly datagen-only, I don't think we should touch them
- @sffc They are
doc(hidden)currently, so let's hold off until we have a decision on #5181 - @sffc Is it
normalizeornormalize_str? Maybe we say that we do_stronly forfromfunctions or functions that could take other types of arguments
Conclusion:
- Update all string-accepting functions to be
foo(&str),foo_utf8(&[u8]),foo_utf16(&[u16]) - Don't touch
doc(hidden)modules at this time
LGTM: @sffc (@robertbastian) @Manishearth
@robertbastian How much of this is done? Does this just need a final audit, or are there pieces that still need doing?
No change since the last update from me.
Blocked on #5181
@robertbastian is that the only case? I'd say we can close this issue then.
Again, https://github.com/unicode-org/icu4x/issues/4931#issuecomment-2354860616
datetime::pattern::reference is now datetime::provider::pattern::reference (an unstable module)
Assuming https://github.com/unicode-org/icu4x/pull/6103, I think we can close this out once that lands.
That landed, nothing left is in stable code.