icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Favor SmallVec over Box in pluralrules

Open gregtatum opened this issue 5 years ago • 22 comments

This will help avoid heap allocations as most lists are only a few items long. The only boxed slice that was retained was the SampleRanges, as this can be of an indeterminate length.

Resolves #193 Depends on #140 Depends on #107

gregtatum avatar Oct 01 '20 20:10 gregtatum

Pull Request Test Coverage Report for Build 073a029aff20c87865f0223c207d4a4e19125dac-PR-285

  • 7 of 10 (70.0%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+11.2%) to 77.341%

Changes Missing Coverage Covered Lines Changed/Added Lines %
components/pluralrules/src/rules/ast.rs 0 3 0.0%
<!-- Total: 7 10
Files with Coverage Reduction New Missed Lines %
components/locale/src/serde/langid.rs 1 88.24%
<!-- Total: 1
Totals Coverage Status
Change from base Build c08d490c92c0187ee5504b8505d5fac7387d8675: 11.2%
Covered Lines: 2990
Relevant Lines: 3866

💛 - Coveralls

coveralls avatar Oct 01 '20 20:10 coveralls

It seems that the linter is not liking what this does to the RulesSelector, as it's now 2480 bytes for the Conditions(PluralRuleList) entry. One mitigation would be to heap allocate the PluralRuleList in a Box<PluralRuleList>. It's a heap allocation, but it might have better memory locality, although... that's just a guess.

gregtatum avatar Oct 01 '20 22:10 gregtatum

Thanks for the contribution!

Can you provide figures on how "cargo bench" differs before and after this PR?

Boxing the PluralRuleList seems like a good idea. It's one heap allocation, but that's better than the dozens that were there before.

sffc avatar Oct 01 '20 22:10 sffc

It does seem to have a nice impact on the microbenchmark:

plurals/parser/parse    time:   [1.1205 us 1.1219 us 1.1233 us]                                  
                        change: [-27.377% -26.828% -26.328%] (p = 0.00 < 0.05)
                        Performance has improved.

zbraniecki avatar Oct 01 '20 23:10 zbraniecki

In terms of long term maintenance, there's an effort now to introduce stack based vector - https://github.com/rust-lang/rfcs/pull/2990 which, if successful, is a stepping stone to a hybrid vector like smallvec.

So I'm cautiously optimistic that we can use smallvec in performance sensitive cases where we commonly have small vectors but want to be ready for them to be larger (like this) and eventually we'll be able to remove the dependency on smallvec.

zbraniecki avatar Oct 01 '20 23:10 zbraniecki

I rebased @gregtatum patch on top of master and checked it against the total benchmarks:

operands/total          time:   [965.73 ns 971.94 ns 981.35 ns]                            
                        change: [-5.7538% -3.2698% -1.1855%] (p = 0.00 < 0.05)
                        Performance has improved.

parser/total            time:   [1.1470 us 1.1600 us 1.1776 us]                          
                        change: [-37.270% -36.208% -35.143%] (p = 0.00 < 0.05)
                        Performance has improved.

pluralrules/total       time:   [85.550 us 85.883 us 86.292 us]                              
                        change: [+4.7499% +6.1966% +7.6202%] (p = 0.00 < 0.05)
                        Performance has regressed.

I was able to reproduce that change (improvement on parsing by over 30% and regression on full selection by ~5-10%) in 5 test runs. I do trust the results to be reliable.

I dove deeper:

zbraniecki@zibi-xps13:~/projects/icu4x/components/pluralrules$ cargo criterion pluralrules --features bench-pluralrules
   Compiling icu-num-util v0.0.1 (/home/zbraniecki/projects/icu4x/components/num-util)
   Compiling icu-pluralrules v0.0.1 (/home/zbraniecki/projects/icu4x/components/pluralrules)
    Finished bench [optimized] target(s) in 7.11s

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Gnuplot not found, using plotters backend
pluralrules/construct/fs                                                                            
                        time:   [143.34 us 143.71 us 144.19 us]
                        change: [+5.2929% +6.2346% +7.1295%] (p = 0.00 < 0.05)
                        Performance has regressed.

pluralrules/select/fs   time:   [1.7583 us 1.7621 us 1.7660 us]                                   
                        change: [-0.1753% +0.9524% +2.0192%] (p = 0.09 > 0.05)
                        No change in performance detected.

Gnuplot not found, using plotters backend
pluralrules/total       time:   [85.585 us 86.826 us 91.162 us]                              
                        change: [+11.223% +10.677% +14.487%] (p = 0.00 < 0.05)
                        Performance has regressed.

I'm not sure what's going on here really. My first hypothesis was that the regression is in selection because accessing AST nodes is slower, but the deep benchmarks show clearly that the regression is on parsing, not selection.

zbraniecki avatar Oct 02 '20 17:10 zbraniecki

can someone reproduce?

zbraniecki avatar Oct 02 '20 17:10 zbraniecki

One hypothesis that could explain that is that my parsing/total benchmark only tests pl rules and is not representative of what pluralrules/construct and pluralrules/total are testing since they all construct for 10 langs: https://github.com/unicode-org/icu4x/blob/master/components/pluralrules/benches/fixtures/plurals.json#L10

We may want to get rid of the pl rules in benchmark fixtures, and just parse rules for those 10 locales as a more representative case.

zbraniecki avatar Oct 02 '20 17:10 zbraniecki

If that hypothesis holds, then this PR helps Polish rules (by 30%) but slows down rules for 10 locales (by 5-10%). Both are syntentic microbenchmarks, but that would make me be against landing this change.

zbraniecki avatar Oct 02 '20 17:10 zbraniecki

Ok, I'm fairly certain that this is the case. The path forward is for #227 to be resolved and then all plural rules tests will use the generated data for the 10+/- locales we select for all benchmarks.

For this PR, I suggest @gregtatum that you test against the plural rules subset or, if you want to help clean up benchmarks, take on the PR to move parser/total to parse conditions for 10 locales that are used everywhere else. This will give us more consistent results.

zbraniecki avatar Oct 02 '20 18:10 zbraniecki

My PR in #295 is coalescing benchmarks to be more reliable by testing against 10 locales, instead of just one. Shane in #296 introduces "testdata" which will give us a single source of those locales and data to be used in all components tests/examples/benchmarks.

This should help us reason about this PR, so I'd suggest now to wait for those two to land.

zbraniecki avatar Oct 03 '20 17:10 zbraniecki

It seems to reproduce on my machine (macOS).

plurals/parser/parse    time:   [1.5964 us 1.6061 us 1.6164 us]
                        change: [-59.684% -59.233% -58.768%] (p = 0.00 < 0.05)
                        Performance has improved.

plurals/convert+select/fs
                        time:   [1.4706 us 1.4817 us 1.4945 us]
                        change: [+7.0000% +8.2399% +9.8204%] (p = 0.00 < 0.05)
                        Performance has regressed.

plurals/select/fs       time:   [1.4470 us 1.4546 us 1.4636 us]
                        change: [+6.6509% +8.0535% +9.3887%] (p = 0.00 < 0.05)
                        Performance has regressed.

plurals/construct/fs/10 time:   [379.03 us 381.91 us 385.46 us]
                        change: [+1.8176% +2.8942% +4.1765%] (p = 0.00 < 0.05)
                        Performance has regressed.

gregtatum avatar Oct 05 '20 13:10 gregtatum

Is PR #295 the one you were suggesting I look at, @zbraniecki ? I'd be happy to follow-up with some more work here, but my cadence will be a bit slower, as I'm doing this on the side. Or were you suggesting a separate issue #?

gregtatum avatar Oct 05 '20 13:10 gregtatum

Is PR #295 the one you were suggesting I look at, @zbraniecki ? I'd be happy to follow-up with some more work here, but my cadence will be a bit slower, as I'm doing this on the side. Or were you suggesting a separate issue #?

No worries. Yeah, #295 is what I had in mind. Let's wait for it to land and then get back to this PR and decide on it! In the meantime, if you'll have spare cycles and want to grab any of those https://github.com/unicode-org/icu4x/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 , let us know!

zbraniecki avatar Oct 05 '20 17:10 zbraniecki

Can you rebase this PR on top of master?

zbraniecki avatar Oct 05 '20 22:10 zbraniecki

Notice: the branch changed across the force-push!

  • components/pluralrules/Cargo.toml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Can you rebase this PR on top of master?

Sure thing.

gregtatum avatar Oct 07 '20 15:10 gregtatum

Depends on #140 Depends on #107

sffc avatar Nov 19 '20 18:11 sffc

This may also be a solution once it gets merged? https://github.com/rust-lang/rfcs/pull/2990

zbraniecki avatar Jan 07 '21 18:01 zbraniecki

Notice: the branch changed across the force-push!

  • Cargo.lock is now changed in the branch
  • components/pluralrules/Cargo.toml is no longer changed in the branch
  • components/pluralrules/src/rules/ast.rs is no longer changed in the branch
  • components/pluralrules/src/rules/mod.rs is no longer changed in the branch
  • components/pluralrules/src/rules/parser.rs is no longer changed in the branch
  • components/plurals/Cargo.toml is now changed in the branch
  • components/plurals/src/rules/ast.rs is now changed in the branch
  • components/plurals/src/rules/mod.rs is now changed in the branch
  • components/plurals/src/rules/parser.rs is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Now that #1240 has landed, should we close this PR and the corresponding ticket as obsolete?

sffc avatar Nov 22 '21 20:11 sffc

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Sep 01 '22 06:09 CLAassistant

This is obsolete because now we use VarZeroVec for runtime.

sffc avatar Nov 10 '22 18:11 sffc