icu4x Reducing size of DateTimeFormatter and other large objects

trafficstars

DateTimeFormatter is 6.6 KB on the stack. This is because it owns a lot of DataPayloads, which are themselves collections of pointers to strings either in bake data or in a postcard buffer. In total, there are hundreds of strings in DateTimeFormatter that may or may not be present in a particular instance.

This is more impactful in C++ than in Rust, because we put the DateTimeFormatter on the heap when returning it to C++.

A few options:

Ignore. C++ users need more heap memory.
Add a global feature on icu_provider that makes DataPayload be much smaller in the BakedDataProvider case, and allocating the existing yoke in the BufferProvider case. The BakedDataProvider will not call ZeroFrom, instead leaving just a &'static reference.
- Pro: Easy to configure for clients who only care about bake data.
- Con: Postcard clients take a hit if this mode is enabled.
- Con: Not great to use with --all-features
Add a type parameter to all formatters that causes them to either have large stack or small stack. The type parameter would have a default value for the status quo behavior (large stack). All formatters would then gain try_new_with_static_provider which would return the formatter in small-stack mode. BakedDataProvider would implement a trait that can be used with this new function.
- Pro: No global behavior-switching feature flag
- Pro: Code for optimal Postcard and Baked usage can co-exist
- Con: Much more complicated in the type system; we are already pushing the bounds of the type system
- Con: Might negatively affect code size; would need to measure impact
Other?

CC @Manishearth @robertbastian

Update (29-06-2023)

Sub-issues:

Pre-2.0:

[ ] https://github.com/unicode-org/icu4x/issues/3347 (existing issue)
[ ] https://github.com/unicode-org/icu4x/issues/3600

Post-2.0:

[x] https://github.com/unicode-org/icu4x/issues/3599
[ ] https://github.com/unicode-org/icu4x/issues/3598

May 05 '23 20:05 sffc

Two other options that came up during discussion:

ZeroBox: We introduce a type ZeroBox/VarZeroBox which has the semantics of a boxing Cow (by default Cow goes to the ToOwned type). It requires the entire data behind it to be ULE. We sprinkle it in on large stack size data payloads. Tradeoffs:
- Pro: No global flag
- Pro: Can be targeted
- Pro: does not disadvantage postcard over baked
- Con: not a perfect win for Baked since the untargeted cases still take up stack space
- Con: Probably slower as it'll go through ULE machinery
- Con: lots and lots of annoying work making everything ULE or VarULE.
- Con: needs try_new_with_static_provider()
DataPayloadCow: We introduce a type DataPayloadCow that has the semantics of a boxing Cow around DataPayload.
- Pro: can be targeted
- Pro: no global flag
- Pro: no need to ULE everything
- Con: forces allocations on postcard each time
- Con: still an imperfect win for Baked
- Con: needs try_new_with_static_provider()

May 05 '23 20:05 Manishearth

In any of these modes that involves boxing the data struct for postcard users, it would be nice if they could work with a cache of deserialized postcard objects (the cache owns the heap-allocated objects and the formatters reference them out of the cache). Haven't thought through how it would work exactly.

May 05 '23 21:05 sffc

Discuss with:

@Manishearth
@sffc
@robertbastian

May 11 '23 18:05 sffc

Related: https://github.com/unicode-org/icu4x/issues/3020

May 12 '23 16:05 Manishearth

core::mem::size_of::<alloc::borrow::Cow<str>>() is 32 on stable and 24 on nightly. That's nice, but maybe we can get it down to 16 if we make the owned type Box<str> instead of String and find some niche.

May 25 '23 18:05 robertbastian

I don't see a way to get to 16 for a cowable slice. In the non-null case, both the borrowed and the owned variants need two full usizes. The only niche is when the pointer is exactly 0, which is sufficient for Option but not when both variants need 16 bits of storage. So, since we need to overflow into a third word, we may as well keep the capacity.

May 25 '23 21:05 sffc

We could for example make the assumption that our strings will at most be half a usize long.

May 26 '23 07:05 robertbastian

Maybe, but then we technically need fallible conversions from Box<str> and &str into HalfLengthCowStr. Or maybe that's enough of a degenerate case that we can debug-assert and GIGO.

May 26 '23 21:05 sffc

@sffc I think it's a no brainer to try specifically fixing DateTimeFormat on the pre-2.0 timeline. Fancy cross-cutting things like ZeroBox/etc are maybe not worth it
@Manishearth Do we have other things as large as DateTimeFormat?
@sffc not even close. we should write a test in verify-zero-copy or in fingerprint

Talking about map_project issue

@Manishearth in some cases may be able to introduce map_project_maybe_ref(|x| -> y, |&x| -> &y)
@sffc could also introduce a DataPayloadOnlyYoke that avoids the staticref problem
@sffc in datetime we only use map_project for patterns
@sffc Would be much simpler to store patterns in a small stack with heap overflow. So we don't need map_project anymore
When @eggrobin does his refactor; reduce map_projects or make them less impactful

Options for fixing datetimeformat size itself:

Make SolarTwelve a VZV. Maybe also make the other things ZeroVecs. Can be done backwards compatibly but will make old-data-new-code (postcard) cause allocs.
Split symbols further into tiny pieces, one DTF object only needs a small subset of symbols. How does the data loading work??? @eggrobin is somewhat skeptical about this actually reducing size too much due to complex dependencies
Use PatternInterpolator design to making a billion datetimeformat objects, this would imply one per locale (and multiple interpolators per locale each of which is small)
Use some funky DataPayload-derived type in DTF
- DataPayload gets APIs that expose its guts
- DateTimeFormat instead of containing a DataPayload (for symbols) will contain something that is effectively an enum between Box<Yoke> and &'static Symbols. A potential future no-alloc mode would omit the box.
- Can be done backcompat-ily
- If we split the data later we can do some weird enumy things

@sffc For option 1 it's a bunch of tricky custom deser impls, maybe just wait for 2.0 and Better Sliced Symbols.

Let's do number 4 (new data payload) and we're doing number 3 anyway (pattern interpolator). We can do Numbers 1 and 2 in the 2.0 timeframe. Number 2 needs investigation.

Decision: @sffc @Manishearth @younies @robertbastian @skius (@eggrobin :confused:)

Milestones: @sffc hoping to get the new pattern interpolator done in 1.4. New data payload thingy should be Q3, 1.x Priority.

manish to file issues

Jun 29 '23 09:06 Manishearth

Issues filed:

https://github.com/unicode-org/icu4x/issues/3599
https://github.com/unicode-org/icu4x/issues/3598
https://github.com/unicode-org/icu4x/issues/3347 (existing issue)
https://github.com/unicode-org/icu4x/issues/3600

Jun 29 '23 10:06 Manishearth

icu4x icu4x copied to clipboard

Reducing size of DateTimeFormatter and other large objects

icu4x
icu4x copied to clipboard