stacks-core icon indicating copy to clipboard operation
stacks-core copied to clipboard

Chore: Replace Contains Key with Entry

Open ASuciuX opened this issue 11 months ago • 17 comments

Benchmark Results

hyperfine -w 3 -r 10 "./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000" "./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000" "./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000"
Benchmark 1: ./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      7.977 s ±  0.060 s    [User: 7.584 s, System: 0.365 s]
  Range (min … max):    7.934 s …  8.105 s    10 runs
 
Benchmark 2: ./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      7.918 s ±  0.024 s    [User: 7.524 s, System: 0.364 s]
  Range (min … max):    7.891 s …  7.965 s    10 runs
 
Benchmark 3: ./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      7.946 s ±  0.012 s    [User: 7.554 s, System: 0.362 s]
  Range (min … max):    7.925 s …  7.966 s    10 runs
 
Summary
  ./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000 ran
    1.00 ± 0.00 times faster than ./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000
    1.01 ± 0.01 times faster than ./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000

The benchmark above showcase the performance of data for the following:

  1. default run
  2. only update mutable occurrences
  3. update mutable and immutable as well

The best benchmark is the 2nd case, with updated mutable occurrences and default immutable ones.

Applicable issues

  • https://github.com/stacks-network/stacks-core/issues/4316

ASuciuX avatar Mar 05 '24 13:03 ASuciuX

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.41%. Comparing base (25311e4) to head (283f46b). Report is 14 commits behind head on next.

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #4483      +/-   ##
==========================================
- Coverage   85.25%   83.41%   -1.85%     
==========================================
  Files         471      471              
  Lines      337762   337762              
  Branches      317      317              
==========================================
- Hits       287955   281728    -6227     
- Misses      49799    56026    +6227     
  Partials        8        8              
Files Coverage Δ
clarity/src/vm/ast/traits_resolver/mod.rs 98.48% <100.00%> (ø)
clarity/src/vm/costs/mod.rs 81.78% <100.00%> (+0.01%) :arrow_up:
clarity/src/vm/types/mod.rs 91.70% <100.00%> (-0.01%) :arrow_down:

... and 65 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 25311e4...283f46b. Read the comment docs.

codecov[bot] avatar Mar 05 '24 14:03 codecov[bot]

I've run the benchmark again after the latest commit (a8597f1), compared to the previous one (885b6e5). The results show a decrease of 2% in block processing time, above the initial 1%:

Benchmark 1: ./stacks-inspect-entry-no-inserts replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      6.694 s ±  0.011 s    [User: 6.310 s, System: 0.350 s]
  Range (min … max):    6.680 s …  6.711 s    10 runs
 
Benchmark 2: ./stacks-inspect-entry-inserts replay-block ~/chainstate range 99990 100000
  Time (mean ± σ):      6.587 s ±  0.009 s    [User: 6.227 s, System: 0.325 s]
  Range (min … max):    6.573 s …  6.606 s    10 runs
 
Summary
  ./stacks-inspect-entry-inserts replay-block ~/chainstate range 99990 100000 ran
    1.02 ± 0.00 times faster than ./stacks-inspect-entry-no-inserts replay-block ~/chainstate range 99990 100000

In this case, Benchmark 1 is 885b6e5, and Benchmark 2 is a8597f1.

ASuciuX avatar Mar 08 '24 14:03 ASuciuX

It takes several (sometimes 10x) lines of code to express that which can be done in a single line with contains_key()

This PR is currently at +275, −247. So there is an increase in line count, but it's pretty small

Moreover, to use this API, you needed to manually transform lots of code on consensus-critical code paths, which makes merging this PR super risky

It's a simple and straightforward transformation, similar to transforming if statements into match statements. It also let's us remove some ugly .expect() calls like this:

let resp = self
    .requests
    .remove(&req)
    .expect("BUG: had key but then didn't")
    .expect("BUG: had response but then didn't");

IMO this makes the entry API safer to use than indexing a HashMap multiple times with other methods

The unit and integration tests are currently passing, but we can also do some extra testing here

Finally, the speed improvements are basically nonexistent (a questionable 1% speedup?), which means the wins from the PR are basically nonexistent

The numbers in the description are incorrect, it's about a 2% speedup in block processing. See the second comment on this issue

jbencin avatar Mar 11 '24 15:03 jbencin

It's a simple and straightforward transformation, similar to transforming if statements into match statements. It also let's us remove some ugly .expect() calls like this:

I'm really not seeing how this:

match foo.get(&bar).entry() {
   Entry::Occupied(value) => {
      /* do something */
   }
   Entry::Vacant(e) => {
      /* do something else */
   }
}

is a legibility improvement over this:

if foo.contains_key(&bar) {
   /* do something */
} else {
   /* do something else */
}

The former has two levels of indentation, whereas the latter has one. Also, the former has 6 lines of boilerplate, whereas the latter has 3. We've been trying to reduce gratuitous indentations for the past few years because it aids in comprehensibility, but this PR works against it.

Do we know for sure that using the Entires API is required to gain that 2% speed improvement? Because, if the transformation is really that straightforward, then the compiler really should be doing that internally (or have an option to do so).

jcnelson avatar Mar 11 '24 16:03 jcnelson

I'm really not seeing how this... is a legibility improvement over this

I don't know about that. Sure the second example is less text, but the first is more explicit because both cases are clearly labeled, rather than using an else for one case. This example also misses that when using Entry, the code inside these blocks can be much more readable because there's no need to access the HashMap again. Consider this example from the PR:

if !self.prune_inbound_counts.contains_key(prune) {
    self.prune_inbound_counts.insert(prune.clone(), 1);
} else {
    let c = self.prune_inbound_counts.get(prune).unwrap().to_owned();
    self.prune_inbound_counts.insert(prune.clone(), c + 1);
}

vs.

match self.prune_inbound_counts.entry(prune.to_owned()) {
    Entry::Occupied(mut e) => {
        *e.get_mut() += 1;
    }
    Entry::Vacant(e) => {
        e.insert(1);
    }
}

Line count went up, but the second is fewer characters and much more readable. In the first, you have to read it and think about what it's doing, whereas it's obvious as soon as you look at the second that you're incrementing a counter. Also, there's no unwrap() needed when using the Entry

Do we know for sure that using the Entires API is required to gain that 2% speed improvement?

The hyperfine stacks-inspect replay-block benchmark we use is very consistent between runs, so I'm confident this represents a performance gain. This is not surprising, it's halving the number of hashes needed in some places.

The perf gains are probably entirely from the few changes in clarity. We should at least merge those. I'm not sure the other changes in stackslib are even used in the replay-block benchmark

Because, if the transformation is really that straightforward, then the compiler really should be doing that internally (or have an option to do so)

You'd think, but during my testing I've been surprised by what the compiler doesn't optimize away at -O3

jbencin avatar Mar 11 '24 21:03 jbencin

The perf gains are probably entirely from the few changes in clarity. We should at least merge those. I'm not sure the other changes in stackslib are even used in the replay-block benchmark

Yup, 100% onboard with this. Now that you mention it, it was surprising that these changes impacted code outside of stacks_common and clarity.

Line count went up, but the second is fewer characters and much more readable. In the first, you have to read it and think about what it's doing, whereas it's obvious as soon as you look at the second that you're incrementing a counter. Also, there's no unwrap() needed when using the Entry

The example you're citing here could also be refactored as follows, with less boilerplate and the same number of clones:

if let Some(c) = self.prune_inbound_counts.get_mut(prune) {
   *c += 1;
} else {
   self.prune_inbound_counts.insert(prune.clone(), 1);
}

Granted, there's a lot of really old code in the stackslib/src/net directory that was written before I had a good handle on Rust, and it could really use a spit-shine.

jcnelson avatar Mar 12 '24 02:03 jcnelson

Also, I think that using if let with .entry() is also the cleanest way to insert a key iff it doesn't exist already. Another example from the PR:

if !lowest_reward_cycle_with_missing_block.contains_key(nk) {
    lowest_reward_cycle_with_missing_block.insert(nk.clone(), reward_cycle);
}

vs.

if let Entry::Vacant(e) = lowest_reward_cycle_with_missing_block.entry(nk) {
    e.insert(reward_cycle);
}

Aside from the Clarity perf gains, there are some clear readability improvements and .unwrap() or .expect() eliminations in this PR that I think are worth keeping as well

jbencin avatar Mar 12 '24 14:03 jbencin

I disagree -- I think it requires more cognitive overhead. Instead of reading "okay, insert this thing into the hash table if it doesn't have it", I'm reading "okay, get this opaque pointer-that-is-not-a-pointer thing ("entry") via a let-binding, and then call this deref-and-store-that-is-not-a-deref-and-store function on it ("insert") to put this thing into the hash table". This is why I avoid the Entry API in my code -- I think it creates a needless layer of indirection that can be avoided with the containers' APIs.

That all said, I'm not really keen to pursue the bike-shedding further since I don't think we're going to ever come to agreement. Also, it's off-topic.

The purpose of the workstream to which this PR belongs is to improve the speed of the Clarity VM. A few comments ago, you mentioned that the speedup you see does not appear to come from the introduction of the Entries API (or if it does, there's no need to be touching files outside of clarity and perhaps stacks_common). When viewed only through this lens, without regards to the Entries API, this PR is not achieving this -- it's making changes that do not improve the Clarity VM, and it's making changes in things outside of the Clarity VM. As such, I'm leaving my review as-is until these two problems are addressed.

jcnelson avatar Mar 12 '24 16:03 jcnelson

@jcnelson I've removed all the modifications outside of clarity package. Can you please review it?

cc: @jbencin

ASuciuX avatar Mar 15 '24 21:03 ASuciuX

@jcnelson @jbencin I've removed the .collect() as it was failing the build, now it should work. Can you please re-approve this?

ASuciuX avatar Mar 22 '24 17:03 ASuciuX

I assume this was failing to compile because of a merge conflict?

There were no previous merge conflicts, what I've seen was that the .collect() was previously implemented at the time when i created this PR, but on the latest next it was removed - that's why the compile was failing. I manually solved it today before updating the branch, so I didn't see any conflicts when re-syncing.

ASuciuX avatar Mar 22 '24 23:03 ASuciuX

It looks like it's because I changed probe_for_generics() to accept impl Iterator in #4500

jbencin avatar Mar 23 '24 01:03 jbencin

I've requested (via Slack) that we try to retain the readability of contains_key via a new trait or method

Do you have a suggestion for how to do that?

jbencin avatar Mar 27 '24 15:03 jbencin

I've requested (via Slack) that we try to retain the readability of contains_key via a new trait or method

Do you have a suggestion for how to do that?

Yeah I gave a few to @ASuciuX on Slack -- but if it proves difficult I'm happy to take it.

cylewitruk avatar Mar 27 '24 15:03 cylewitruk

If there is a more appropriate place for the traits and implements please let me know and I will move them.

ASuciuX avatar Apr 12 '24 21:04 ASuciuX

If there is a more appropriate place for the traits and implements please let me know and I will move them.

If @cylewitruk can merge his StacksHashMap changes, that would be the best place to put it. Or if not, this should go in it's own file, maybe in stacks-common/src/util or stacks-common/src/types

jbencin avatar Apr 15 '24 19:04 jbencin

If @cylewitruk can merge his StacksHashMap changes, that would be the best place to put it. Or if not, this should go in it's own file, maybe in stacks-common/src/util or stacks-common/src/types

I might actually suggest starting a new dir stacks_common/src/ext where all "extension" traits can be placed. This pattern can be applied to a lot of repetitive and/or verbose code..

If StacksHashMap can merge then the logic can be placed directly in an impl - this is more of a way to extend the API for non-local types.

cylewitruk avatar Apr 16 '24 07:04 cylewitruk