icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Set VarZeroVec metadata byte to 4

Open sffc opened this issue 1 year ago • 6 comments

This fixes #1410 for the purposes of 1.0 data stability, by setting the metadata byte to 4. This enables us in post-1.0 to make the "flex" version of VarZeroVec a configurable feature for both library and datagen (i.e., a non-default feature zerovec/flex_vzv). Services can always provide metadata byte 4, but data packs for size-constrained devices will be able to use smaller widths.

sffc avatar Jul 29 '22 01:07 sffc

Bench results (no change except 0.7% in the parsing function):

``` Benchmarking vzv_precompute/get/precomputed: Collecting 100 samples in estimated 5.000 vzv_precompute/get/precomputed time: [1.7117 ns 1.7154 ns 1.7204 ns] change: [-0.1460% +0.0893% +0.3296%] (p = 0.47 > 0.05) No change in performance detected. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe

Benchmarking vzv_precompute/get/slice: Collecting 100 samples in estimated 5.0000 s (2 vzv_precompute/get/slice
time: [1.8349 ns 1.8374 ns 1.8403 ns] change: [-0.5059% -0.2485% +0.0018%] (p = 0.06 > 0.05) No change in performance detected. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 1 (1.00%) high severe

 Running benches/zerovec.rs (/usr/local/google/home/sffc/projects/icu4x/target/release/deps/zerovec-2a682270ceaad1e3)

WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0. This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

 Running benches/zerovec_iai.rs (/usr/local/google/home/sffc/projects/icu4x/target/release/deps/zerovec_iai-d36a11db6e05c490)

sum_slice Instructions: 81 (No change) L1 Accesses: 86 (No change) L2 Accesses: 4 (No change) RAM Accesses: 6 (No change) Estimated Cycles: 316 (No change)

sum_zerovec Instructions: 89 (No change) L1 Accesses: 93 (No change) L2 Accesses: 4 (No change) RAM Accesses: 7 (No change) Estimated Cycles: 358 (No change)

binarysearch_slice Instructions: 61 (No change) L1 Accesses: 66 (No change) L2 Accesses: 4 (No change) RAM Accesses: 4 (No change) Estimated Cycles: 226 (No change)

binarysearch_zerovec Instructions: 68 (No change) L1 Accesses: 73 (No change) L2 Accesses: 4 (No change) RAM Accesses: 4 (No change) Estimated Cycles: 233 (No change)

varzeroslice_parse_get Instructions: 410 (-0.726392%) L1 Accesses: 513 (+0.588235%) L2 Accesses: 4 (No change) RAM Accesses: 14 (No change) Estimated Cycles: 1023 (+0.294118%)

varzeroslice_get Instructions: 28 (No change) L1 Accesses: 33 (No change) L2 Accesses: 4 (No change) RAM Accesses: 4 (No change) Estimated Cycles: 193 (No change)

varzeroslice_get_unchecked Instructions: 29 (No change) L1 Accesses: 34 (No change) L2 Accesses: 4 (No change) RAM Accesses: 4 (No change) Estimated Cycles: 194 (No change)

</details>

sffc avatar Jul 29 '22 02:07 sffc

Won't we lose the equality invariant with such a change in the future?

Manishearth avatar Jul 29 '22 13:07 Manishearth

As discussed:

  • The equality invariant is only when comparing a VarZeroVec to another VarZeroVec
  • If the VarZeroVec comes from the same data file, all is okay, and it's most likely that both sides of an equality operation will originate in a data file, because constructing a VarZeroVec at runtime is costly, and we recommend get_by for such cases
  • It could be triggered by using a VarZeroVec as the key of a ZeroMap, but we currently only have string and fixed-sized keys
  • It could also be triggered by using a #[derive(VarULE)] on a type having more than one variable-width field, and then using equality operations on that type
  • If it's truly necessary, we can bump all the data struct version numbers when the change occurs, but it's probably fine so long as we make sure ICU4X code doesn't break
  • I will continue to work on this, but we are out of time for 1.0, and this is a quick way to give us flexibility; I would rather merge this PR and risk an equality invariant breakage than drop the FlexZeroVec change indefinitely

sffc avatar Jul 31 '22 05:07 sffc

I wrote some additional docs on both VarZeroVec and the equality section of VarULE.

The process of writing the docs made me think that another way to handle this would be for the DataRequest to request a VarZeroVec serialization version, in the DataRequestMetadata. The current version could be the "fast" version, and we could introduce a "small" version later that is a bit slower. The validate_byte_slice impl can enforce that the correct form is provided.

sffc avatar Aug 01 '22 06:08 sffc

test failures

Manishearth avatar Aug 01 '22 16:08 Manishearth

This is superseded by #2306. However, there are a few bits and pieces of this PR (like documentation) that I'd like to restore, so I will keep the PR open as draft until I have time to do that.

sffc avatar Aug 02 '22 21:08 sffc

This is superseded by https://github.com/unicode-org/icu4x/issues/2312. See https://github.com/unicode-org/icu4x/issues/1410

sffc avatar Nov 10 '22 18:11 sffc

Re-opening because I said above that I wanted to restore some docs.

sffc avatar Nov 10 '22 18:11 sffc

New PR: #2979

sffc avatar Jan 12 '23 00:01 sffc

Archived in a branch named archive/vzv-fzv-july

sffc avatar Jan 12 '23 00:01 sffc