icu4x
icu4x copied to clipboard
Reducing size of DateTimeFormatter and other large objects
DateTimeFormatter is 6.6 KB on the stack. This is because it owns a lot of DataPayloads, which are themselves collections of pointers to strings either in bake data or in a postcard buffer. In total, there are hundreds of strings in DateTimeFormatter that may or may not be present in a particular instance.
This is more impactful in C++ than in Rust, because we put the DateTimeFormatter on the heap when returning it to C++.
A few options:
- Ignore. C++ users need more heap memory.
- Add a global feature on icu_provider that makes DataPayload be much smaller in the BakedDataProvider case, and allocating the existing yoke in the BufferProvider case. The BakedDataProvider will not call ZeroFrom, instead leaving just a
&'staticreference.- Pro: Easy to configure for clients who only care about bake data.
- Con: Postcard clients take a hit if this mode is enabled.
- Con: Not great to use with
--all-features
- Add a type parameter to all formatters that causes them to either have large stack or small stack. The type parameter would have a default value for the status quo behavior (large stack). All formatters would then gain
try_new_with_static_providerwhich would return the formatter in small-stack mode. BakedDataProvider would implement a trait that can be used with this new function.- Pro: No global behavior-switching feature flag
- Pro: Code for optimal Postcard and Baked usage can co-exist
- Con: Much more complicated in the type system; we are already pushing the bounds of the type system
- Con: Might negatively affect code size; would need to measure impact
- Other?
CC @Manishearth @robertbastian
Update (29-06-2023)
Sub-issues:
Pre-2.0:
- [ ] https://github.com/unicode-org/icu4x/issues/3347 (existing issue)
- [ ] https://github.com/unicode-org/icu4x/issues/3600
Post-2.0:
- [x] https://github.com/unicode-org/icu4x/issues/3599
- [ ] https://github.com/unicode-org/icu4x/issues/3598
Two other options that came up during discussion:
ZeroBox: We introduce a typeZeroBox/VarZeroBoxwhich has the semantics of a boxingCow(by default Cow goes to theToOwnedtype). It requires the entire data behind it to be ULE. We sprinkle it in on large stack size data payloads. Tradeoffs:- Pro: No global flag
- Pro: Can be targeted
- Pro: does not disadvantage postcard over baked
- Con: not a perfect win for Baked since the untargeted cases still take up stack space
- Con: Probably slower as it'll go through ULE machinery
- Con: lots and lots of annoying work making everything ULE or VarULE.
- Con: needs
try_new_with_static_provider()
DataPayloadCow: We introduce a typeDataPayloadCowthat has the semantics of a boxingCowaroundDataPayload.- Pro: can be targeted
- Pro: no global flag
- Pro: no need to ULE everything
- Con: forces allocations on postcard each time
- Con: still an imperfect win for Baked
- Con: needs
try_new_with_static_provider()
In any of these modes that involves boxing the data struct for postcard users, it would be nice if they could work with a cache of deserialized postcard objects (the cache owns the heap-allocated objects and the formatters reference them out of the cache). Haven't thought through how it would work exactly.
Discuss with:
- @Manishearth
- @sffc
- @robertbastian
Related: https://github.com/unicode-org/icu4x/issues/3020
core::mem::size_of::<alloc::borrow::Cow<str>>() is 32 on stable and 24 on nightly. That's nice, but maybe we can get it down to 16 if we make the owned type Box<str> instead of String and find some niche.
I don't see a way to get to 16 for a cowable slice. In the non-null case, both the borrowed and the owned variants need two full usizes. The only niche is when the pointer is exactly 0, which is sufficient for Option but not when both variants need 16 bits of storage. So, since we need to overflow into a third word, we may as well keep the capacity.
We could for example make the assumption that our strings will at most be half a usize long.
Maybe, but then we technically need fallible conversions from Box<str> and &str into HalfLengthCowStr. Or maybe that's enough of a degenerate case that we can debug-assert and GIGO.
- @sffc I think it's a no brainer to try specifically fixing DateTimeFormat on the pre-2.0 timeline. Fancy cross-cutting things like ZeroBox/etc are maybe not worth it
- @Manishearth Do we have other things as large as DateTimeFormat?
- @sffc not even close. we should write a test in verify-zero-copy or in fingerprint
Talking about map_project issue
- @Manishearth in some cases may be able to introduce
map_project_maybe_ref(|x| -> y, |&x| -> &y) - @sffc could also introduce a DataPayloadOnlyYoke that avoids the staticref problem
- @sffc in datetime we only use map_project for patterns
- @sffc Would be much simpler to store patterns in a small stack with heap overflow. So we don't need map_project anymore
- When @eggrobin does his refactor; reduce map_projects or make them less impactful
Options for fixing datetimeformat size itself:
- Make SolarTwelve a VZV. Maybe also make the other things ZeroVecs. Can be done backwards compatibly but will make old-data-new-code (postcard) cause allocs.
- Split symbols further into tiny pieces, one DTF object only needs a small subset of symbols. How does the data loading work??? @eggrobin is somewhat skeptical about this actually reducing size too much due to complex dependencies
- Use PatternInterpolator design to making a billion datetimeformat objects, this would imply one per locale (and multiple interpolators per locale each of which is small)
- Use some funky DataPayload-derived type in DTF
- DataPayload gets APIs that expose its guts
- DateTimeFormat instead of containing a DataPayload (for symbols) will contain something that is effectively an
enumbetweenBox<Yoke>and&'static Symbols. A potential future no-alloc mode would omit the box. - Can be done backcompat-ily
- If we split the data later we can do some weird enumy things
@sffc For option 1 it's a bunch of tricky custom deser impls, maybe just wait for 2.0 and Better Sliced Symbols.
Let's do number 4 (new data payload) and we're doing number 3 anyway (pattern interpolator). We can do Numbers 1 and 2 in the 2.0 timeframe. Number 2 needs investigation.
Decision: @sffc @Manishearth @younies @robertbastian @skius (@eggrobin :confused:)
Milestones: @sffc hoping to get the new pattern interpolator done in 1.4. New data payload thingy should be Q3, 1.x Priority.
manish to file issues
Issues filed:
- https://github.com/unicode-org/icu4x/issues/3599
- https://github.com/unicode-org/icu4x/issues/3598
- https://github.com/unicode-org/icu4x/issues/3347 (existing issue)
- https://github.com/unicode-org/icu4x/issues/3600