icu4x
icu4x copied to clipboard
Reduce stack size of ShortVec in icu_locid, helping reduce Locale & LanguageIdentifier
Currently, we define ShortVec
to be
enum ShortVec<T> {
Empty,
Single(T),
Multi(Vec<T>),
}
This requires 32 bytes on x86_64: Vec<T>
is 3*usize (24 bytes), and the discriminant requires one additional word.
Pending some dependencies, for T = NonZeroUsize
and smaller, 16 bytes should be achievable as follows:
enum ShortBoxSlice<T> {
ZeroOne(Option<T>),
Multi(Box<[T]>),
}
(thanks @Manishearth and @mikebenfield)
Some notes:
- This depends on the Rust upstream PR https://github.com/rust-lang/rust/pull/94075 landing first
- For this to work for
Variant
, which is the motivating use case, we need to fix #2083 first - It hopefully won't be necessary to expand the
Box
into an explicitNonZeroPtr
pointer andNonZeroUsize
length, but that is an option we could pursue if needed
If we want to go further than the ShortVec
regression we could put the Extensions
struct on the heap and cut the size of Locale
by 75% (216 to 56 bytes with the current ShortVec
).
Once on the heap we can make Extensions
?Sized
, which would unlock some optimizations that could offset the extra deref, such as storing the most common extension type (unicode?) inside the struct instead of its own allocation.
Edited, had a typo that messed up results before. I didn't sleep much.
I just tried this with today's nightly:
enum ShortVec<T> {
Empty,
Single(T),
Multi(Vec<T>),
}
assert_eq!(size_of::<ShortVec<usize>>(), 32);
assert_eq!(size_of::<ShortVec<NonZeroUsize>>(), 32);
enum ShortVec2<T> {
Empty,
Single(T),
Multi(Box<[T]>),
}
assert_eq!(size_of::<ShortVec2<usize>>(), 24);
assert_eq!(size_of::<ShortVec2<NonZeroUsize>>(), 24);
enum ShortVec3<T> {
ZeroOne(Option<T>),
Multi(Box<[T]>),
}
assert_eq!(size_of::<ShortVec3<usize>>(), 24);
assert_eq!(size_of::<ShortVec3<NonZeroUsize>>(), 16); // this is 24 on 1.64
Don't celebrate, this seems very brittle and won't work with TinyAsciiStr
:
https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=89c6184a24634bf80c245ffbcbfc0928
^ @mikebenfield Can you take a look at that playground? Not sure why that niche is not being found.
Oh, wait, it's fine. It should be an array of NonZeroU8
not NonZeroUsize
https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=cd9d9465aa1b58927f7668ac1148e8e5
Oh yeah. But will it work with a struct around the array? 🤷♂️
FWIW, if it's strongly preferable for you all to maintain the 3 variants like
enum ShortVec<T> {
Empty,
Single(T),
Multi(Box<[T]>),
}
and for ShortVec<NonZeroUsize>
to still be 16 bytes...
That would actually be achievable with the changes from this PR plus one more easy optimization that maybe I should add to it anyway. However, no guarantees about when that PR gets merged; I'm kind of waiting on something else before I continue work on it.
Yes that would be ideal. I filed rust-lang/rust#102997 about this.
Let's go with ShortVec3
and not wait on Rust. cc @sffc
@pdogr implemented ShortVec3
but used Vec
instead of Box
, as otherwise pushing would always reallocate.
If we use Box
we probably want to remove push
, remove
, and insert
. @pdogr to investigate the impact of this. Afaict we only push in one location (Value::try_from_bytes
), which we can probably customize to not use push
but somehow assemble from the iterator with a single allocation.
Closed by #3220