icu4x
icu4x copied to clipboard
Favor SmallVec over Box in pluralrules
This will help avoid heap allocations as most lists are only a few items long. The only boxed slice that was retained was the SampleRanges, as this can be of an indeterminate length.
Resolves #193 Depends on #140 Depends on #107
Pull Request Test Coverage Report for Build 073a029aff20c87865f0223c207d4a4e19125dac-PR-285
- 7 of 10 (70.0%) changed or added relevant lines in 2 files are covered.
- 1 unchanged line in 1 file lost coverage.
- Overall coverage increased (+11.2%) to 77.341%
| Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
|---|---|---|---|
| components/pluralrules/src/rules/ast.rs | 0 | 3 | 0.0% |
| <!-- | Total: | 7 | 10 |
| Files with Coverage Reduction | New Missed Lines | % |
|---|---|---|
| components/locale/src/serde/langid.rs | 1 | 88.24% |
| <!-- | Total: | 1 |
| Totals | |
|---|---|
| Change from base Build c08d490c92c0187ee5504b8505d5fac7387d8675: | 11.2% |
| Covered Lines: | 2990 |
| Relevant Lines: | 3866 |
💛 - Coveralls
It seems that the linter is not liking what this does to the RulesSelector, as it's now 2480 bytes for the Conditions(PluralRuleList) entry. One mitigation would be to heap allocate the PluralRuleList in a Box<PluralRuleList>. It's a heap allocation, but it might have better memory locality, although... that's just a guess.
Thanks for the contribution!
Can you provide figures on how "cargo bench" differs before and after this PR?
Boxing the PluralRuleList seems like a good idea. It's one heap allocation, but that's better than the dozens that were there before.
It does seem to have a nice impact on the microbenchmark:
plurals/parser/parse time: [1.1205 us 1.1219 us 1.1233 us]
change: [-27.377% -26.828% -26.328%] (p = 0.00 < 0.05)
Performance has improved.
In terms of long term maintenance, there's an effort now to introduce stack based vector - https://github.com/rust-lang/rfcs/pull/2990 which, if successful, is a stepping stone to a hybrid vector like smallvec.
So I'm cautiously optimistic that we can use smallvec in performance sensitive cases where we commonly have small vectors but want to be ready for them to be larger (like this) and eventually we'll be able to remove the dependency on smallvec.
I rebased @gregtatum patch on top of master and checked it against the total benchmarks:
operands/total time: [965.73 ns 971.94 ns 981.35 ns]
change: [-5.7538% -3.2698% -1.1855%] (p = 0.00 < 0.05)
Performance has improved.
parser/total time: [1.1470 us 1.1600 us 1.1776 us]
change: [-37.270% -36.208% -35.143%] (p = 0.00 < 0.05)
Performance has improved.
pluralrules/total time: [85.550 us 85.883 us 86.292 us]
change: [+4.7499% +6.1966% +7.6202%] (p = 0.00 < 0.05)
Performance has regressed.
I was able to reproduce that change (improvement on parsing by over 30% and regression on full selection by ~5-10%) in 5 test runs. I do trust the results to be reliable.
I dove deeper:
zbraniecki@zibi-xps13:~/projects/icu4x/components/pluralrules$ cargo criterion pluralrules --features bench-pluralrules
Compiling icu-num-util v0.0.1 (/home/zbraniecki/projects/icu4x/components/num-util)
Compiling icu-pluralrules v0.0.1 (/home/zbraniecki/projects/icu4x/components/pluralrules)
Finished bench [optimized] target(s) in 7.11s
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Gnuplot not found, using plotters backend
pluralrules/construct/fs
time: [143.34 us 143.71 us 144.19 us]
change: [+5.2929% +6.2346% +7.1295%] (p = 0.00 < 0.05)
Performance has regressed.
pluralrules/select/fs time: [1.7583 us 1.7621 us 1.7660 us]
change: [-0.1753% +0.9524% +2.0192%] (p = 0.09 > 0.05)
No change in performance detected.
Gnuplot not found, using plotters backend
pluralrules/total time: [85.585 us 86.826 us 91.162 us]
change: [+11.223% +10.677% +14.487%] (p = 0.00 < 0.05)
Performance has regressed.
I'm not sure what's going on here really. My first hypothesis was that the regression is in selection because accessing AST nodes is slower, but the deep benchmarks show clearly that the regression is on parsing, not selection.
can someone reproduce?
One hypothesis that could explain that is that my parsing/total benchmark only tests pl rules and is not representative of what pluralrules/construct and pluralrules/total are testing since they all construct for 10 langs: https://github.com/unicode-org/icu4x/blob/master/components/pluralrules/benches/fixtures/plurals.json#L10
We may want to get rid of the pl rules in benchmark fixtures, and just parse rules for those 10 locales as a more representative case.
If that hypothesis holds, then this PR helps Polish rules (by 30%) but slows down rules for 10 locales (by 5-10%). Both are syntentic microbenchmarks, but that would make me be against landing this change.
Ok, I'm fairly certain that this is the case. The path forward is for #227 to be resolved and then all plural rules tests will use the generated data for the 10+/- locales we select for all benchmarks.
For this PR, I suggest @gregtatum that you test against the plural rules subset or, if you want to help clean up benchmarks, take on the PR to move parser/total to parse conditions for 10 locales that are used everywhere else. This will give us more consistent results.
My PR in #295 is coalescing benchmarks to be more reliable by testing against 10 locales, instead of just one. Shane in #296 introduces "testdata" which will give us a single source of those locales and data to be used in all components tests/examples/benchmarks.
This should help us reason about this PR, so I'd suggest now to wait for those two to land.
It seems to reproduce on my machine (macOS).
plurals/parser/parse time: [1.5964 us 1.6061 us 1.6164 us]
change: [-59.684% -59.233% -58.768%] (p = 0.00 < 0.05)
Performance has improved.
plurals/convert+select/fs
time: [1.4706 us 1.4817 us 1.4945 us]
change: [+7.0000% +8.2399% +9.8204%] (p = 0.00 < 0.05)
Performance has regressed.
plurals/select/fs time: [1.4470 us 1.4546 us 1.4636 us]
change: [+6.6509% +8.0535% +9.3887%] (p = 0.00 < 0.05)
Performance has regressed.
plurals/construct/fs/10 time: [379.03 us 381.91 us 385.46 us]
change: [+1.8176% +2.8942% +4.1765%] (p = 0.00 < 0.05)
Performance has regressed.
Is PR #295 the one you were suggesting I look at, @zbraniecki ? I'd be happy to follow-up with some more work here, but my cadence will be a bit slower, as I'm doing this on the side. Or were you suggesting a separate issue #?
Is PR #295 the one you were suggesting I look at, @zbraniecki ? I'd be happy to follow-up with some more work here, but my cadence will be a bit slower, as I'm doing this on the side. Or were you suggesting a separate issue #?
No worries. Yeah, #295 is what I had in mind. Let's wait for it to land and then get back to this PR and decide on it! In the meantime, if you'll have spare cycles and want to grab any of those https://github.com/unicode-org/icu4x/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 , let us know!
Can you rebase this PR on top of master?
Notice: the branch changed across the force-push!
- components/pluralrules/Cargo.toml is different
~ Your Friendly Jira-GitHub PR Checker Bot
Can you rebase this PR on top of master?
Sure thing.
Depends on #140 Depends on #107
This may also be a solution once it gets merged? https://github.com/rust-lang/rfcs/pull/2990
Notice: the branch changed across the force-push!
- Cargo.lock is now changed in the branch
- components/pluralrules/Cargo.toml is no longer changed in the branch
- components/pluralrules/src/rules/ast.rs is no longer changed in the branch
- components/pluralrules/src/rules/mod.rs is no longer changed in the branch
- components/pluralrules/src/rules/parser.rs is no longer changed in the branch
- components/plurals/Cargo.toml is now changed in the branch
- components/plurals/src/rules/ast.rs is now changed in the branch
- components/plurals/src/rules/mod.rs is now changed in the branch
- components/plurals/src/rules/parser.rs is now changed in the branch
~ Your Friendly Jira-GitHub PR Checker Bot
Now that #1240 has landed, should we close this PR and the corresponding ticket as obsolete?
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
This is obsolete because now we use VarZeroVec for runtime.