icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Zero-copy datetime skeletons (DateSkeletonPatternsV1, datetime/skeletons@1)

Open Manishearth opened this issue 2 years ago • 11 comments

part of https://github.com/unicode-org/icu4x/issues/876, part of https://github.com/unicode-org/icu4x/issues/856

DateSkeletonPatternsV1 is one of our final remaining non-zero-copy types. It's a rather complicated tree:

pub struct DateSkeletonPatternsV1<'data>(
    pub LiteMap<SkeletonV1, PatternPlurals<'data>>,
);

pub struct SkeletonV1(pub Skeleton);
pub struct Skeleton(pub(crate) SmallVec<[fields::Field; 5]>);

// Has a ULE type already
pub struct Field {
    pub symbol: FieldSymbol,
    pub length: FieldLength,
}

pub enum PatternPlurals<'data> {
    /// A collection of pattern variants for when plurals differ.
    MultipleVariants(PluralPattern<'data>),
    /// A single pattern.
    SinglePattern(Pattern<'data>),
}

pub struct PluralPattern<'data> {
    /// The field that 'variants' are predicated on.
    pivot_field: Week,
    pub(crate) zero: Option<Pattern<'data>>,
    pub(crate) one: Option<Pattern<'data>>,
    pub(crate) two: Option<Pattern<'data>>,
    pub(crate) few: Option<Pattern<'data>>,
    pub(crate) many: Option<Pattern<'data>>,
    pub(crate) other: Pattern<'data>,
}

pub struct Pattern<'data> {
    pub items: ZeroVec<'data, PatternItem>,
    pub(crate) time_granularity: TimeGranularity,
}

There are two parts to this: firstly, the skeleton needs to be made zero-copy, and then PatternPlurals. Both need to be VarULE or ULE to work inside a ZeroMap.

Skeleton

Skeleton seems easy, we replace Skeleton with #[make_varule] struct Skeleton(ZeroVec<'data, Fields>). I'm a bit worried that this will make lookup rather slow (since Ord will be slower): but also as far as I can tell, we never use the LiteMap as an actual map at use time, we only iterate the map in order.

Another option is to replace Skeleton with the unparsed skeleton string.

PatternPlurals

I'd rather not write a custom ULE type here, but ultimately, we can. I do think, however, we can get most of the benefits by restructuring this type a bit.

Basically, this type can be structured as

#[make_varule]
struct  PatternPlurals<'data> {
    pivot_field: Option<Week>, // this needs a ULE impl for Option, which can be done.
    patterns: VarZeroVec<'data, VarZeroSlice<Option<Pattern>>>,
}

#[make_varule]
pub struct Pattern<'data> {
    pub(crate) time_granularity: TimeGranularity,
    pub items: ZeroVec<'data, PatternItem>,
}

We will need an AsULE implementation on Option<T: AsULE> as well as a VarULE implementation on Option<T: VarULE> in cases where we can guarantee that T never has length zero (this can be done with an additional trait).

Then, PatternPlurals::patterns becomes a vector that must have at least one non-None element in the beginning, and the rest of the elements can be nonexistent or None (or we can guarantee that it either has length 1 or length 6).

Thoughts?

Feedback requested:

  • [x] @sffc
  • [x] @zbraniecki
  • [x] @nordzilla
  • [x] @gregtatum

Manishearth avatar Mar 08 '22 22:03 Manishearth