icu4x Add icu4x_shared with some helpers

trafficstars

Relates to numerous issues:

#4467
#2310
#2778
#5124
#5127

Jun 21 '24 13:06 sffc

I'm not really in favor of introducing a Transparent trait, but I'll have a closer look later. I don't think such a trait is necessary to achieve what we need here.

Jun 22 '24 16:06 Manishearth

I'm not really in favor of introducing a Transparent trait, but I'll have a closer look later. I don't think such a trait is necessary to achieve what we need here.

I introduced it as a crate-private trait in order to make the macros safe. Happy for suggestions on how to do this better.

Jun 24 '24 15:06 sffc

Ah, I see what you've done here.

I think what I was envisioning was a way to not have any crate-specific unsafe code at use sites. A macro that can be invoked as

impl_helpers!(
 /// some docs
 #[derive(...)]
 #[repr(transparent)]
 pub struct Foo(Bar);
)

Thinking about it more it probably could still a trait so that it can be split into different cfg()able macros. I'm fine with that. But ideally the actual transparent impl can be done without writing unsafe.

Also FWIW while I don't consider it a blocking opinion, if we're putting unsafe stuff in icu4x_shared my opinion on #4467 does shift a bit: I kind of think unsafe stuff is ideally placed in a separate crate so projects can properly evaluate it. It is nice that the bulk of our crates currently have zero unsafe code, I'd love to retain that property. "No unsafe code" is a much easier slam-dunk evaluation than "unsafe code but it's copypaste unsafe code".

Jun 24 '24 16:06 Manishearth

Actually, no, thinking about it more, I do kind of consider this somewhat blocking. This goes back to the point I made about being very careful what we sacrifice in favor of the goals of serving cases like that of rust-url: I don't think "a lot more ICU4X crates contain unsafe code" is an acceptable cost. I'd rather this just lived in icu_provider, even though that's counter to #4467. Or a separate crate, but icu_provider seems fine. Probably doc(hidden) with the guarantee that we won't break its usage in ICU4X across minor versions (the policy agreed upon in https://github.com/unicode-org/icu4x/issues/4343#issuecomment-1834249362).

Jun 24 '24 16:06 Manishearth

I would use the ref-cast crate, but it doesn't support everything we need (box casting), and dtolnay says it's unlikely to support that.

Adding yet another crate to do this, in addition to the docs macros noted in #4467, has downsides:

It's a crate that doesn't actually do anything and is 100% ICU4X internal
ICU4X-internal APIs in core crates are costly due to the additional requirement that they can't change across minor versions, meaning they are effectively public stable APIs and we have less flexibility to change them
It increases our crate count even more, which we know is a problem, highlighted by rust-url

Some options:

Copy these repr(transparent) helper macros only to the crates that need them. Put the other macros proposed in #4467 in a different file that doesn't contain unsafe code.
Don't use macros and just copy and paste these unsafe impls everywhere. This is what we currently do. Pro: the unsafe code is close to where it is used. Con: the unsafe code is spread out which means it's more likely to have bugs; centralized unsafe code, even if it's in a file that we copy around to many places, is better from a safety and maintenance POV.
Use a crate and eat the cost. We already have a bazillion crates so what's one more?

Jun 24 '24 16:06 sffc

ICU4X-internal APIs in core crates are costly due to the additional requirement that they can't change across minor versions, meaning they are effectively public stable APIs and we have less flexibility to change them

I think this specific API is fine to put in a core crate, for two reasons:

It's primarily a macro. It is possible (though trickier) to write this in a way that is entirely a macro with no trait dependencies. This gives us a lot of flexibility when it comes to breakages.
I don't think we'll need to break how this works
As mentioned before, we do not need to avoid changing this in breaking ways, only in ways that break ICU4X. That's far easier to manage.

I was extremely in favor of copy-pasting for the docs macros, because there is almost zero cost to that. Making every ICU4X crate (or even "every ICU4X crate using these macros") contain unsafe code is a major cost and affects ecosystem perception as well as the practicalities of unsafe review.

Put differently: doc macros copy pasted across crates are effectively one bit of code that is seen in multiple places. The only impact "duplicate code" has is on maintainers, and we'll have tooling making it transparent. unsafe macros copy pasted across crates are not effectively the same bit of code seen in multiple places, each copy adds a cost, and it has impacts outside of maintainers, including in ways that affect the perception of our project.

Personally, I am in favor of option 4: these macros (and only these macros) live in icu_provider, doc(hidden) with a comment about stability across minor versions.

Preferences are 4 > 3 >> 1 > 2, I consider "every crate that needs this needs unsafe code" to be an unacceptable cost for what I have previously highlighted as a very nebulous benefit.

Jun 24 '24 16:06 Manishearth

Actually, hold on, I'm also comfortable with the unsafe macros living inside the varule crate. That would work just fine, and they could be public if we want, they're clearly useful.

Happy to write them in a way that doesn't involve Transparent if desired.

Jun 24 '24 17:06 Manishearth

This functionality is sufficiently similar to what's in zerovec that I'd be okay adding them there

In case it got lost, I want to point out that these are not only VarULE helper macros. They also encapsulate functionality from ref-cast, which is code we are already currently required to have sprinkled in various places. This particular crate already has unsafe code; the PR at hand does not change that. We're not regressing. We're not "sacrificing" anything in the more narrow context of this PR; that's only a hypothetical situation if a future migration changes a safe crate to an unsafe create. If we can move this crate's unsafe code elsewhere, that's a win.

Jun 24 '24 20:06 sffc

We're not regressing. We're not "sacrificing" anything in the more narrow context of this PR

Okay, but the "narrow context of the PR" is not the context in which we were planning this change: we were planning to use these helpers to clean up a large chunk of our derive(ULE) and VarULE uses. This is not some hypothetical future migration, this is a migration we tentatively agreed upon less than a week ago. If I have these objections, it would be quite counterproductive of me to wait for one of the bigger migrations to state these objections.

I also do think that from a reviewability perspective, this PR in its narrow context still makes things worse: it moves a single callsite over to a more indirect macro situation. The repr(transparent) unsafe we have currently that replaces ref-cast is duplicated boilerplate, but it's super super easy to review. Macros will obfuscate it, if we were just replacing manual-ref-cast (even with multiple callsites) with this macro I'd still be somewhat skeptical.

Jun 24 '24 22:06 Manishearth

I haven't done a full audit but I would make the claim that most variable length repr(transparent) things need both ref-cast and VarULE impls, and since all ref-cast impls currently involve unsafe, there might not be many cases where we would be turning a safe crate to an unsafe crate (which I agree is a big cost).

This PR definitely introduces indirection; my position is that it is an overall improvement, but it's a reasonable debate and one I was hoping to have. I think it is clearer and safer for the safety invariants to live as trait bounds in macros that all live in one file whose only job is to make it exceedingly obvious that the safety invariants are upheld rather than having random unsafe blocks sprinkled around the crate.

Jun 24 '24 23:06 sffc

Can you explain or demonstrate or cite how to do this safely without the trait bound? That might help move the conversation forward. Ultimately I'm mostly onboard with putting this in the zerovec crate.

Jun 24 '24 23:06 sffc

variable length

operative term here, I think we'll be applying these to most fixed length ones too

and I'm not opposed to the VarULE macros exposing these methods too, it's not a hard toggle to add

Jun 25 '24 00:06 Manishearth

Can the invariant be enforced without the trait by generating a static assertion that the sizes are equal?

Jun 25 '24 13:06 sffc

@sffc I don't think we need to test the invariant at all, as long as the macro controls generating the struct.

Jun 25 '24 15:06 Manishearth

I would be quite sad but not block if the solution is wrapping the struct definition in a macro. I prefer standalone macros with traits or assertions

Jun 25 '24 15:06 sffc

Going to move discussion to https://github.com/unicode-org/icu4x/issues/5127

Jun 26 '24 00:06 Manishearth

Converting to draft pending discussions

Jun 27 '24 21:06 sffc

All committers have signed the CLA.

Sep 29 '25 18:09 CLAassistant

icu4x icu4x copied to clipboard

Add icu4x_shared with some helpers

icu4x
icu4x copied to clipboard